Project

General

Profile

Task #818

Clean up source file headers and set up automatic generation

Added by Szilárd Páll about 9 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
build system
Target version:
Difficulty:
uncategorized
Close

Description

There are lots of source files with deprecated headers and it's difficult to keep them up to date manually. There is a pretty good tool called headache that can auto-generate headers based on custom formats.

What needs to be decided is when we do this because it will result in a change on each and every file in the repository.


Related issues

Related to GROMACS - Task #845: Format white space correctlyClosed11/25/2011
Related to Support Platforms - Task #648: Use of commit hook scriptsClosed

Associated revisions

Revision 123678f4 (diff)
Added by Teemu Murtola almost 8 years ago

Fix license text printed by programs.

Since source files are now licensed under LGPL, it doesn't make much
sense for the programs to indicate they are still under GPL.
Follow-up to I1c507b3, which seems to have missed this.

Part of #818.

Change-Id: I44ffcf75f4840326d09f924249619077ea5a1013

Revision 95761d63 (diff)
Added by Mark Abraham over 7 years ago

Fixing copyright issues and code contributors

Added Mark Abraham, Christoph Junghans and
Carsten Kutzner to the list of code contributors,
per discussion with Erik and Berk.

  • Now that 2013 is a copyrightable year, we need to
    assert that copyright
  • No, we can't just say 1995-2013, unless we also say
    that every year in the range is a copyrightable year. I'm
    not interested in demonstrating after the fact that this is
    true for all the years before I was born. So we may as
    well start listing explicit years.
    Please read "howto (L)GPL" guides before arguing :-)
  • Removed erroneous GROMACS copyright statements on
    unmodified FindBLAS.cmake,FindLAPACK.cmake,
    FindOpenMP.cmake and slightly modified FindGit.cmake
  • Note Szilard's uncertainty about the origin of
    vectype_ops.cuh and thus its copyright status
  • TODO for 4.6.1 deal with material in admin and scripts

References #818

Change-Id: I946200651de7f05c2ff292fcded8f33fca7b356f

Revision 3347654d (diff)
Added by Teemu Murtola over 7 years ago

Fix copyright notices for new C++ code.

Replace the copyright headers with the new headers in all source files.
Add the new header to most build system files and some other files.
May have missed some files that should have the header, but most files
new in the master branch should now have the new header, with a
reasonable set of copyrightable years.

Update the list of years as the years where there have been commits to
those files (excluding most recent copyright and uncrustification
changes). Remove copyright years that predate existence of the code.

Regenerated the selection parser files from source to fix line numbers
caused by different number of lines in the new copyright header.
Add copyright declarations to generated selection parser files.
Some other changes in the generated parser.* caused by Bison updates
(now used 2.6.2, previously had used 2.5).

Part of #818.

Change-Id: I38c18c03b1ee0ff55fd112951d3f741274ad59af

Revision 7f6a0e18 (diff)
Added by Teemu Murtola over 7 years ago

Fix copyright notices for new C++ code.

Replace the copyright headers with the new headers in all source files.
Add the new header to most build system files and some other files.
May have missed some files that should have the header, but most files
new in the master branch should now have the new header, with a
reasonable set of copyrightable years.

Update the list of years as the years where there have been commits to
those files (excluding most recent copyright and uncrustification
changes). Remove copyright years that predate existence of the code.

Regenerated the selection parser files from source to fix line numbers
caused by different number of lines in the new copyright header.
Add copyright declarations to generated selection parser files.
Some other changes in the generated parser.* caused by Bison updates
(now used 2.6.2, previously had used 2.5).

Part of #818.

Change-Id: I38c18c03b1ee0ff55fd112951d3f741274ad59af

Revision b6b93061 (diff)
Added by Teemu Murtola almost 7 years ago

Reformat existing LGPL copyright notices.

Update files that already had the new-style LGPL copyright notice to
match what the new copyright script expects (slightly different author
list). Generated kernels need a somewhat separate approach, so skipped
in this change.

Applied with
git ls-tree --name-only -r HEAD | git check-attr filter --stdin | \
sed -Ene '/(copyright|uncrustify)$/ {s/:.*//;p;}' | \
grep -v '_kernel_' | xargs admin/copyright.py --update-header
with the .gitattributes and copyright.py from Ie21365a.

Also adapt COPYING to match more or less the contents of the file in
release-4-6.

Part of #818.

Change-Id: I2885b36f65ac3599a7dfb66efdb7faed55fa9557

Revision 7a2a9e32 (diff)
Added by Teemu Murtola almost 7 years ago

Add some missing copyright headers.

Went through files that were modified by
git ls-tree --name-only -r HEAD | git check-attr filter --stdin | \
sed -Ene '/(copyright|uncrustify)$/ {s/:.*//;p;}' | \
grep -v '_kernel_' | xargs admin/copyright.py --add-missing
and updated the copyright years for those where I know the history or
could easily dig it out. Mostly this is CMake build system files, which
came to existence at earliest 2009, so it wasn't too hard to dig out
when the code was actually introduced/changed, using git log and
git log --follow.

Part of #818.

Change-Id: Ibba6d8bacc700efd76b7bc429228d751b8e17a27

Revision 3c598143 (diff)
Added by Teemu Murtola almost 7 years ago

Script to automatically update copyright

Adapt the automatic uncrustification script to also update the
copyright header for changed files. Add a Python script to do the
actual copyright header processing. The Python script can also do other
copyright header tasks. Make the pre-commit hook more flexible to be
able to do either only uncrustify or only copyright checking, as well as
temporarily skipping the hook for a commit.

Related to #818.

Change-Id: Ie21365acbe07e1f097e6d72c6a5e0d0826631ff0

Revision 6a6830c2 (diff)
Added by Teemu Murtola almost 7 years ago

Update copyright headers in generated kernels.

Update copyright headers in generated kernel files to the new format.
Update also some copyright years.

Adjust the generating scripts to reuse the copyright header from
admin/copyright.py to keep it up-to-date. For the Verlet kernels, some
changes may be required if/when we want to add a copyright header also
into the .pre files.

Part of #818.

Change-Id: I6b5060f12a3c469d0080c73bc9d674e074ce44a4

Revision 145a3d64 (diff)
Added by Teemu Murtola almost 7 years ago

Update copyright headers in kernels.

Update copyright headers in kernel files to match the new format.
Since all the kernel code has been introduced in 2012, removed copyright
years older than that. The Python script for Verlet kernels was updated
to strip a copyright header from the templates to allow adding one there
as well. This commit processes all files that are not produced by the
kernel generators, but contain kernel in their path.

Part of #818.

Change-Id: I5d7babc137cdbe0c427cbaa44209b9e8f7be55bb

Revision 8a7cde1d (diff)
Added by Teemu Murtola almost 7 years ago

Adjust more copyright headers

Replace old copyright headers with the new in files that have been moved
to the new module layout. Adjust the script to handle the cases
encountered, and add some exclusions to the attributes.

Put best-guess estimates for the copyright years based on when the files
were created (if recently) and when they have been modified. Unless
obvious that the file was created for 4.6 or otherwise recently, did not
trace back beyond the time the files were moved to their current
locations.

Files under directories moved as a whole from release-4-6
(legacyheaders, gmxlib etc.) have not been considered yet.

Part of #818.

Change-Id: Ida30498a97606ec652008299fc1652ca9609539e

Revision ac25fa90 (diff)
Added by Teemu Murtola over 6 years ago

More copyright header updates

Remove old copyrights from files added recently (for 4.6 or somewhat
earlier, plus some selected files where it was easier to enumerate all
years than to adapt the script to handle their existing copyrights) and
make the copyright headers follow the new format. Put best-guess
estimates for the copyright years based on when the files have been
modified. No effort was spent on trying to track content beyond git
rename detection, so some content may originate from earlier than the
first copyright year mentioned, but this is hopefully not a big deal.

Part of #818.

Change-Id: I44e1eff552bff47a2ae10b3181960a8a2a6753df

Revision 7c2f8ee0 (diff)
Added by Teemu Murtola over 6 years ago

Fix remaining copyright headers

Apply copyright to the remaining files, excluding thread_mpi, with the
copyright year as the year when the last change to that file was made
(excluding other copyright stuff and mass uncrustification).

Part of #818.

Change-Id: Ib70deb8a8e71b23511b68c91e05c1dee821a6d2c

Revision 1ef24f05 (diff)
Added by Teemu Murtola over 6 years ago

Suppress remaining errors from copyright script

- Add a mechanism to the script that allows it to be used only for
uncrustification, but not for copyright checking for certain files.
- Use the mechanism to suppress copyright checking for thread_mpi files.
It's somewhat unclear whether we should just prepend the Gromacs
copyright to these files and get rid of this exception, or whether we
still are planning to maintain the library separately. But until that
is clear, this keeps the script quiet, and the copyright headers need
to be manually maintained if desired.
- Suppress a few files that don't have any copyright header.
- Add copyright header to two source files that were still missing it.

With these changes, running admin/copyright.py on all files indicated by
.gitattributes only produces messages about outdated copyright years
(which it should, for files that have not been modified this year).

Part of #818.

Change-Id: I2cb09b2b30e782244c16faf02625d405dac642b6

Revision 66bc6656 (diff)
Added by Teemu Murtola over 6 years ago

Include CUDA and latex files in copyright check

Now also CUDA and latex files are indicated by the .gitattributes file
to contain copyright, and the copyright.py script can deal with them.
This should conclude the work to reformat copyright headers in the
master branch. Some individual files may still be missing, but those
can be fixed when spotted.

Part of #818

Change-Id: Ib6d3e10b57a42ea7a8c990e51df669cace4c8d8f

Revision 26dbdcc2 (diff)
Added by Teemu Murtola over 6 years ago

Fix copyright check in admin/uncrustify.sh

The automatic copyright year check didn't consider files that were also
uncrustified. Also removed some extra code (that had no effect) from
the script that looks like incorrectly merged or pasted.

Also fix incorrect attribute for thread_mpi/CMakeLists.txt.

Part of #818.

Change-Id: I80ce1b936fa9b77a94eb3dfe8fcff0490a886f42

History

#1 Updated by Teemu Murtola about 9 years ago

If this needs to be done already for 4.6, it should be carefully considered how it will be merged into master, which already has a significant amount of new files and some old ones have been removed (right now, changes are still limited to the selection library). If this is just done in the release-4-6 branch without an immediate merge to master both before and after, it will very likely result in a lot of gray hairs in the next merge.

Easiest would just be to postpone it for 5.0.

#2 Updated by Rossen Apostolov about 9 years ago

  • Category changed from documentation to mdrun
  • Assignee changed from Justin Lemkul to Teemu Murtola

I agree with Teemu. Headers are a mess for a long time and it's better not to waste effort on that.

#3 Updated by Teemu Murtola about 9 years ago

  • Category changed from mdrun to build system
  • Assignee deleted (Teemu Murtola)
  • Target version changed from 4.6 to 5.0

Changed target release to 5.0, as I also think that that's the easiest way. Also, this is not mdrun-specific, so changed category to "build system" for the lack of a better alternative. And I have no prior knowledge of headache or any other such tool, so I don't think I'm a very good person to take this forward; cleared the assignee.

#4 Updated by Szilárd Páll about 9 years ago

Teemu Murtola wrote:

If this needs to be done already for 4.6, it should be carefully considered how it will be merged into master, which already has a significant amount of new files and some old ones have been removed (right now, changes are still limited to the selection library). If this is just done in the release-4-6 branch without an immediate merge to master both before and after, it will very likely result in a lot of gray hairs in the next merge.

Easiest would just be to postpone it for 5.0.

Good point, let's do that.

#5 Updated by Teemu Murtola over 8 years ago

The clean-up of the headers is mostly straightforward (and can hopefully be automated), but I would be good to first consider what needs to be in the copyright headers. Here's just a list of questions that may be worth addressing?
  • Do we need copyright headers in all files?
  • Should the copyright header contain some version information (this gets very easily out of date)?
  • Should the copyright header contain year numbers (also gets out of date)?
  • Should the copyright header contain names of people, or just generic reference to Gromacs developers? (list of developers could be maintained separately)
  • Should the copyright header contain the license text and/or explanation of the license terms, or just a reference to a separate text file?
  • What should the license text be? Currently some files say GPL, some LGPL, some (at least thread_mpi) are under BSD-style license.
  • Is there some difference between "new" and "old" source files, as Erik implies in https://gerrit.gromacs.org/#/c/1042/? What is "new" and what is not?
  • How should we treat files coming from external sources? E.g., bundled external libraries such as boost, fftpack, lapack/blas/arpack, gtest/gmock, etc.? Do we need to add our own copyright headers if we use the files without modification? How about if we do a few one-line changes? Licenses of the external files of course need to be put somewhere.

Note that already now, there are several versions of the copyright header in our source files, i.e., there are other differences than just outdated version numbers or years.

Added Erik as a watcher, since he seems to have some strong opinions about these issues.

#6 Updated by Szilárd Páll almost 8 years ago

As it has been decided that with 4.6 we are switching to LGPL, I think we should try to come up with a strategy. I would prefer to have clean headers that reflect the move to LGPL and also address the other issues.

I'll start by expressing my opinion on (some of) the questions Teemu raised.

Teemu Murtola wrote:

The clean-up of the headers is mostly straightforward (and can hopefully be automated), but I would be good to first consider what needs to be in the copyright headers. Here's just a list of questions that may be worth addressing?
  • Do we need copyright headers in all files?

I'd say that at least in source files we do.

  • Should the copyright header contain some version information (this gets very easily out of date)?

Unless it's auto-generated through some automated mechanism, we should not. I'm not familiar with the possibilities, but there should be some git hooks for such tasks.

  • Should the copyright header contain year numbers (also gets out of date)?

If it is not important to have files with up to date year numbers than we could as well just omit it. If it is important it would also be better to update the year automatically.

  • Should the copyright header contain names of people, or just generic reference to Gromacs developers? (list of developers could be maintained separately)

I vote for separate list of developers as it's easier to maintain.

  • Should the copyright header contain the license text and/or explanation of the license terms, or just a reference to a separate text file?

I think a reference to the license file is enough.

  • What should the license text be? Currently some files say GPL, some LGPL, some (at least thread_mpi) are under BSD-style license.

The decision on LGPL has been made and AFAIK Erik got OK from all present and past developers. I'm not sure about thread_mpi, though.

#7 Updated by Teemu Murtola almost 8 years ago

Szilárd Páll wrote:

As it has been decided that with 4.6 we are switching to LGPL, I think we should try to come up with a strategy. I would prefer to have clean headers that reflect the move to LGPL and also address the other issues.

Ah, that's news at least to me.

The decision on LGPL has been made and AFAIK Erik got OK from all present and past developers. I'm not sure about thread_mpi, though.

Well, I was not asked. ;) Not that I would have anything against it, but I think this somehow reflects how little information flow there is within the project...

For both the version number and years, I think it would be best to omit them unless someone feels that for legal reasons we must have those. The more often we will update those, the more noise we will have in the version history of every file. Unless we add some pre-commit hook, but that can only be done at the client side, and can get complex.

#8 Updated by Erik Lindahl almost 8 years ago

Ah, that's news at least to me.

The decision on LGPL has been made and AFAIK Erik got OK from all present and past developers. I'm not sure about thread_mpi, though.

Well, I was not asked. ;) Not that I would have anything against it, but I think this somehow reflects how little information flow there is within the project...

For both the version number and years, I think it would be best to omit them unless someone feels that for legal reasons we must have those. The more often we will update those, the more noise we will have in the version history of every file. Unless we add some pre-commit hook, but that can only be done at the client side, and can get complex.

Well, not quite - What I said is that I have explicitly mailed old (no-longer) developers that still have code committed (e.g. Peter Tieleman) and also people outside our groups (e.g. Michael Shirts, Roland Schulz). For currently active developers the topic has been mentioned a couple of times on the mailing lists since 2009, we've raised it a the Gromacs meetings, and people there have not protested. However,Teemu might have fallen between chairs since he has not really been an independent group, but also not belonged to us since 2009 or been at the Gromacs meetings. Teemu - I take your comment above as if you're OK with it, but you are more than welcome to mail me (privately is OK too) if there are code contributions you do not want under LGPL for whatever reason.

As for copyright headers, there needs to be some sort of copyright owner to deal with companies, etc. when there are questions.
We've said "the gromacs team" to emphasize all users, but since that is not technically a legal entity we have also listed them main developers' names - that seems smarter than using our universities and include all their legal departments (and in Sweden it's even legal due to the teacher's exception).

#9 Updated by Teemu Murtola almost 8 years ago

I think I recall some discussion quite some time ago, but nothing that I would have taken as a concrete decision to move into any direction. ;) But as said, I'm fine with the change.

For the actual copyright headers in the source files, I think the main point of my questions is that would something like this be sufficient:

/* Copyright (c) The Gromacs development team (see AUTHORS file in the distribution)
 * See COPYING file in the distribution for license terms.
 */

This would be lightweight, easy to maintain (basically, no changes needed in the source files after adding the header, even if people or the license changes). Currently, most of the information that is contained in the extended header is anyways outdated...

#10 Updated by Erik Lindahl almost 8 years ago

Unfortunately I think we need the actual license text in the file - we've previously gotten bug reports from package maintainers in Linux distros when that was missing. Referring to an authors file should be fine, though!

#11 Updated by Szilárd Páll almost 8 years ago

Erik Lindahl wrote:

Unfortunately I think we need the actual license text in the file - we've previously gotten bug reports from package maintainers in Linux distros when that was missing. Referring to an authors file should be fine, though!

In that case would it be OK to include the license text without a reference to the year? That way we could avoid having outdated headers. Otherwise, we can as well include version (at least major.minor) because we'll have to update all headers at least once a year -- not a big deal if automated, but it does result in "fake" updates on files.

#12 Updated by Erik Lindahl almost 8 years ago

Hi,

I would simply say "Copyright (c) 2011-" if we create a particular file the first time 2011, and then we don't have to update it. If somebody wants to write a script and re-process all the files we have, we can use 1995 for that year!

#13 Updated by Teemu Murtola almost 8 years ago

Well, here's another sketch:

/* This file is part of Gromacs, a molecular simulation package.
 * Copyright (c) YYYY- The Gromacs development team
 * (see AUTHORS file in the distribution)
 *
 * Gromacs is free software; you can redistribute it and/or
 * modify it under the terms of the GNU Lesser General Public
 * License as published by the Free Software Foundation; either
 * version 2.1 of the License, or (at your option) any later version.
 *
 * See COPYING file in the distribution and http://www.gromacs.org
 * for more information.
 */

Or could we be even briefer and just mention LGPL v2.1 with a reference to www.gnu.org? Or do we want to keep all the text there now is in every source file? Having it only in COPYING makes maintenance easier (e.g., if someone wants to change wording in other parts than the actual license) and reduces the proportion of comments in files (in particular with C++, we may end up with a lot of relatively small files, where the copyright header is a significant portion of the contents). But this possibly also makes people less aware of it.

Formatting the copyright notice is one thing; another is to decide where to apply it. Files produced by us are easy, but currently we also have some that are taken from external sources (in master, some of those are under src/external/). Most (all?) of those are distributed with a BSD-style license (there may be some from PD), and the licenses are in COPYING-OTHER (and possibly also in the source files or somewhere else under src/external/). We have made varying amount of changes to those files (boost and Google Test are the ones that I know more details of, and those are essentially untouched except for one or two one-line fixes), and some of these files under src/external/ now have the Gromacs copyright header prepended to them and some don't. This is also something where we should have a clear policy.

#14 Updated by Erik Lindahl almost 8 years ago

Hi,

I think Teemu's sketch is great.

Files that are completely taken from some other source should not have any copyright headers from us. However, when we modify them or do other stuff where we feel it is important to stress it is not the original version I usually append our header, and then have a second description below saying a file was taken from somewhere, and that people are allowed to redistribute that code under the original license. Not perfect, but that will work even if there are multiple contributions from different sources in one file.

From a legal point there is apparently no such thing as "public domain". FFTPACK is an example of code that doesn't really have any copyright header, but people treat it as public domain. We should probably add something there too, but I have no idea what to write...

On the other hand: If we were Apple & Samsung these things would be super-delicate, but for an open source project it is hopefully not a matter of life and death. Let's just try our best, and accept that it might not be 100% perfect.

#15 Updated by Szilárd Páll almost 8 years ago

Erik Lindahl wrote:

Hi,

I would simply say "Copyright (c) 2011-" if we create a particular file the first time 2011, and then we don't have to update it. If somebody wants to write a script and re-process all the files we have, we can use 1995 for that year!

We don't need a separate script, we can use one of the following (note: I've only tried the first one):

#16 Updated by Teemu Murtola almost 8 years ago

  • Target version changed from 5.0 to 4.6

Do you Szilard have some idea how you want to run headache (or some other tool)? Would this be a one-off exercise, after which we (try to) maintain them manually? This will be much easier, though, if the header does not contain any information that needs to be changed. Or should we set up some system that automatically inserts the headers, either on client side, or checks them on server side (e.g., on Jenkins)? The latter might be more work than it is worth.

Have you Szilard tested whether headache can cleanly remove the existing headers and replace them with new ones, or is there manual work (or extra scripting) required for that? For generating the headers, if we want to set separate years for existing files based on when they were created, it would probably require a separate script. It's probably not possible to reliably find the year when the code was introduced (if it has, e.g., been moved), but we could possibly put a bit of effort for recent code (e.g., C++ code added in master for sure hasn't been written before 2009, and selection code not before 2007 (and that was also included in Gromacs in 2009)). The year information could be nice in some cases.

Changed the target version to 4.6, since the license probably needs to be updated somewhere if we want to change it, and it will be quite confusing if the source files say something else.

#17 Updated by Szilárd Páll almost 8 years ago

Teemu Murtola wrote:

Do you Szilard have some idea how you want to run headache (or some other tool)? Would this be a one-off exercise, after which we (try to) maintain them manually? This will be much easier, though, if the header does not contain any information that needs to be changed. Or should we set up some system that automatically inserts the headers, either on client side, or checks them on server side (e.g., on Jenkins)? The latter might be more work than it is worth.

Well it could be done either way. I know that some people run a check on the header as a repository pre-commit hook and reject commits that contains source files with incorrectly formatted headers and update headers in post-commit hooks. However, I think as we do not plan to have a closed date interval nor a version number, we would be fine doing it once and then maintaining it manually -- perhaps adding some custom regexp-based checks as a jenkins module.

Have you Szilard tested whether headache can cleanly remove the existing headers and replace them with new ones, or is there manual work (or extra scripting) required for that? For generating the headers, if we want to set separate years for existing files based on when they were created, it would probably require a separate script. It's probably not possible to reliably find the year when the code was introduced (if it has, e.g., been moved), but we could possibly put a bit of effort for recent code (e.g., C++ code added in master for sure hasn't been written before 2009, and selection code not before 2007 (and that was also included in Gromacs in 2009)). The year information could be nice in some cases.

To be honest, I have only tested headache with toy examples and have not though about the fact that we have to keep some of the information from the current headers which might make things pretty tricky both with and without a tool.

Changed the target version to 4.6, since the license probably needs to be updated somewhere if we want to change it, and it will be quite confusing if the source files say something else.

Agreed.

#18 Updated by Teemu Murtola almost 8 years ago

  • Assignee set to Mark Abraham

At least most of this was done by Mark in e6cd064a, but without a reference to this issue. Assigning to Mark for adapting his approach to master branch.

Since the new copyright notice does include the year, we need some approach for keeping those up-to-date as well. And it seems that some old copyright lines with pre-existing years were kept within the new notice, meaning that the notices are not identical in all files. Is this the best approach? I'm quite sure that those pre-existing years were not accurate either...

#19 Updated by Teemu Murtola over 7 years ago

Copied the TODO list from e6cd064 to make sure it doesn't get lost (I'm not aware of anything from this list having been addressed):
  • vectype_ops.cuh probably from CUDA SDK needs attribution
  • fix licensing statements for things we have stolen from CMake
  • fix things under admin, scripts

#20 Updated by Teemu Murtola over 7 years ago

  • Status changed from New to In Progress

#21 Updated by Szilárd Páll over 7 years ago

Teemu Murtola wrote:

Copied the TODO list from e6cd064 to make sure it doesn't get lost (I'm not aware of anything from this list having been addressed):

Good idea!

  • vectype_ops.cuh probably from CUDA SDK needs attribution

I'm not entirely sure what is the correct way to deal with these cases. This file defines commodity vector operations for CUDA vector types by overloading operators (and not only). The initial few operation were lifted from the CUDA SDK, but the majority was added manually later. Still, most of the content of this file resembles closely (or even matches) the code in the SDK.

What's the procedure in this case? Can we claim that it's our code?

#22 Updated by Erik Lindahl over 7 years ago

I guess the Google defense is that you cannot copyright APIs ;-)

I don't think we have to make a big deal about it - let's just add a comment about what types were borrowed from CUDA SDK, and if anybody ever complains we can deal with it then.

#23 Updated by Szilárd Páll over 7 years ago

Erik Lindahl wrote:

I guess the Google defense is that you cannot copyright APIs ;-)

I don't think we have to make a big deal about it - let's just add a comment about what types were borrowed from CUDA SDK, and if anybody ever complains we can deal with it then.

Sure, makes sense. However, it's not really the type we borrowed, but some of the operator overloading, e.g
@
inline host device float3 make_float3(float4 a) {
return make_float3(a.x, a.y, a.z);
}

or shorthands for creating a zeroed vector-type variable e.g

inline host device float3 make_float3(float s) {
return make_float3(s, s, s);
}
@

So instead of pinpointing which ones were copied verbatim and which ones inspired, I suggest including a text like: "The functions below have been partially reused from the math helper functions provided by the NVIDIA CUDA SDK (see CUDA_SDK_ROOT/C/common/inc/helper_math.h)."

Does that sound OK?

#24 Updated by Teemu Murtola over 7 years ago

I'll try one more time to create a bit of discussion on this topic. I would mainly want to clarify what we want to do in the future to avoid ending up in a situation that created this issue in the first place. Fixing the historic copyright notices isn't that important (although I would like to do it for the files that I've written, as that isn't really that much work; but right now, I have no idea what would be the "correct" fix, because of reasons below).

As it is now, I think even these two fundamental questions are quite unclear:
  • What is the scope of the copyright notices in the source files?
    • Is it the contents of that individual file?
    • Or is it the all of Gromacs? In this case, it should really be identical for all files (except possibly for files that have earlier been published somewhere else and not as part of Gromacs, but even there, the extra notices could appear outside our own). My interpretation of this issue was that this was the original goal, but the approach taken (preserving whatever inaccurate historic information there was) seems to indicate that this is not desirable.
  • What constitutes a "copyrightable" year?
    • Is it enough that the contents have been available in public source control on that year?
    • Or does it require that the contents have actually been modified that year? Or reverse, is it sufficient to assert copyright only for those years when there actually have been modifications? To me, this is the most natural interpretation.
    • Or should we only include years when actual releases have been made?
    • Is it enough to only modify the copyright year to count as "modified" in that year? To me, modifying the year without any other changes that year just produces unnecessary noise in the log and in the header.

I was originally for having an uniform copyright notice in all files. But that may create more exceptions than it is worth (and also a lot of noise if we need to include each year in the notice), so I'm now leaning on treating each file individually. Based on the discussion I've seen so far (which is not that much), I would propose that we do that, and only list those years when actual modifications have been made. This can be done with a relatively simple script (which isn't any more work to use than the current approach), and/or with a pre-commit hook. I already have drafted a commit hook that can be used for this purpose (most of the tricky parts are necessary also for #845, if we want to have a commit hook for that, so the work isn't wasted even if we don't want it here).

If we decide to go the commit hook way, I would propose that we use #648 for discussing the general approach we want to take for writing, modifying and distributing those hooks. It could be nice to have also them reviewed on gerrit.

Finally, I don't want to start another argument on the contents of the header, but the current one is so long that it will make up a significant portion of some files, which in turn makes git often think that those files are copies or renames from each other. Doesn't really matter in most practical use, but makes e.g., git log --follow less useful and also some files show as renames of unrelated files in gerrit for some patches.

#25 Updated by Mark Abraham over 7 years ago

  • Target version changed from 4.6 to N/A

Teemu Murtola wrote:

I'll try one more time to create a bit of discussion on this topic. I would mainly want to clarify what we want to do in the future to avoid ending up in a situation that created this issue in the first place. Fixing the historic copyright notices isn't that important (although I would like to do it for the files that I've written, as that isn't really that much work; but right now, I have no idea what would be the "correct" fix, because of reasons below).

As it is now, I think even these two fundamental questions are quite unclear:
  • What is the scope of the copyright notices in the source files?
    • Is it the contents of that individual file?
    • Or is it the all of Gromacs?

I have interpreted http://www.gnu.org/licenses/gpl-howto.html and the final section of http://www.gnu.org/licenses/lgpl-2.1.html to mean that we should probably attempt a non-trivial copyright header in each source file. I have been prepared to overlook CMake files so far.

In this case, it should really be identical for all files (except possibly for files that have earlier been published somewhere else and not as part of Gromacs, but even there, the extra notices could appear outside our own). My interpretation of this issue was that this was the original goal, but the approach taken (preserving whatever inaccurate historic information there was) seems to indicate that this is not desirable.

I took the view that it was not my business to attempt to retroactively correct what may or may not have been accurate or useful statements of copyright from the past.

  • What constitutes a "copyrightable" year?

http://www.gnu.org/licenses/gpl-howto.html is vague, but suggests that a copyrightable year is one in which the code is released. "Release" is undefined, and I would not want to argue the point in court that code available in public-access repositories on the internet was not released simply because nobody can find an email from someone announcing that it has been released. Particularly as the people who'd likely be wanting to argue the point that it is not released would include those in the nebulous category of being in a position to assert that they can make an official release. It's a mess. Even the converse (that a release email that really has been sent by someone we regard has having authority to do that constitutes a release) is hard to establish in court because there is no concrete product or verifiable paper chain. Copyright sucks for software, but it's the best we have.

  • Is it enough that the contents have been available in public source control on that year?
  • Or does it require that the contents have actually been modified that year? Or reverse, is it sufficient to assert copyright only for those years when there actually have been modifications? To me, this is the most natural interpretation.

In practice, I think it is appropriate to copyright a year in which we change a source file.

  • Or should we only include years when actual releases have been made?

I think that creates theoretical vulnerability - someone could take a copy of either uncopyrighted code or code copyrighted for a future year (in anticipation of a release schedule) and argue that because we had not released the code we have not copyrighted it, so they can do so. Conversely, if we have made a copyright statement on code for which we have no plans to make a release in that year, I think it will be hard for us to lose from that.

  • Is it enough to only modify the copyright year to count as "modified" in that year? To me, modifying the year without any other changes that year just produces unnecessary noise in the log and in the header.

Yeah I'd like to not have noise like that.

I was originally for having an uniform copyright notice in all files. But that may create more exceptions than it is worth (and also a lot of noise if we need to include each year in the notice), so I'm now leaning on treating each file individually. Based on the discussion I've seen so far (which is not that much), I would propose that we do that, and only list those years when actual modifications have been made. This can be done with a relatively simple script (which isn't any more work to use than the current approach), and/or with a pre-commit hook. I already have drafted a commit hook that can be used for this purpose (most of the tricky parts are necessary also for #845, if we want to have a commit hook for that, so the work isn't wasted even if we don't want it here).

I'd be happy to use a hook that added the current year to a copyright statement on the first time it was applicable. If we wanted to use a range of years, then we have to have some text explaining what that means (per http://www.gnu.org/licenses/gpl-howto.html). Since I am not in a position to say that is true for years before 2012 or so, I think that if we use ranges of years in future, we should do so starting from 2012.

It would clearly be inappropriate to slap "Copyright 1995-2013" on every file.

We certainly have to preserve external people's independent copyright statements. I think it is reasonable to not modify existing copyright statements even if they might be not-very-appropriate.

If we decide to go the commit hook way, I would propose that we use #648 for discussing the general approach we want to take for writing, modifying and distributing those hooks. It could be nice to have also them reviewed on gerrit.

OK

Finally, I don't want to start another argument on the contents of the header, but the current one is so long that it will make up a significant portion of some files, which in turn makes git often think that those files are copies or renames from each other. Doesn't really matter in most practical use, but makes e.g., git log --follow less useful and also some files show as renames of unrelated files in gerrit for some patches.

My reading of http://www.gnu.org/licenses/gpl-howto.html is that only our last two paragraphs about derivative works and funding are legally optional. Whether those last two paragraphs do us more harm than good is arguable, but I think we have more important things to discuss :-)

#26 Updated by Teemu Murtola over 7 years ago

I'm not keen on retrospectively fixing the copyrightable years throughout the code either (although I did that for my code in 3347654d). But I think it would be great if we had a common policy for going forward, such that the information from now on would be more or less accurate. And possibly, if people have interest, they could fix some individual files when they notice that they have, e.g., been created in 2010, so can't possibly have any copyright before that. In particular with 5.0 and going forward, if parts of the code are getting rewritten, we could also then slowly get rid of more and more of the historical ballast.

I have a Python script that, combined with https://gerrit.gromacs.org/#/c/2155/, does implement the hook. It does require some tweaking to support preserving arbitrary existing copyright notices, which I haven't done yet since currently the master branch still doesn't have the proper copyright headers. But I have been using it for changes I've done after 3347654d, and it is working reasonably well. If we want to go this way, we simply need to decide how we want to enforce the policy and for which files (C/C++ source files only, or also CMake files and/or other scripts). The selection of any particular subset of files should be relatively easy with .gitattributes files.

#27 Updated by Mark Abraham over 7 years ago

I would suggest a policy that any source file we change or add to should get a copyright bump. I guess a change that is simply a deletion is reasonable to treat similarly. Certainly convenient to do that. Source should mean all C, C++, kernel generator .py. Random scripts outside src can probably be ignored. Happy to ignore CMake files since they are often small.

#28 Updated by Teemu Murtola about 7 years ago

  • Target version changed from N/A to 5.0

#29 Updated by Teemu Murtola almost 7 years ago

  • Subject changed from clean up source file headers and set up automatic generation to Clean up source file headers and set up automatic generation
  • Status changed from In Progress to Fix uploaded

With changes leading up to https://gerrit.gromacs.org/#/c/2945/, the source files should now have up-to-date headers, and there are scripts and commit hooks to keep them up-to-date. I still have a script that isn't uploaded, which I've used for finding out those files where the copyright bump has been forgotten (to do stuff like https://gerrit.gromacs.org/#/c/2945/). Also, if we set up automatic checking for source formatting using uncrustify, we could easily also add checks for copyright headers, using the same scripts.

#30 Updated by Gerrit Code Review Bot over 6 years ago

Gerrit received a related patchset '1' for Issue #818.
Uploader: Teemu Murtola ()
Change-Id: I80ce1b936fa9b77a94eb3dfe8fcff0490a886f42
Gerrit URL: https://gerrit.gromacs.org/2951

#31 Updated by Teemu Murtola over 6 years ago

  • Status changed from Fix uploaded to Resolved

Should be working now, including a Jenkins check. Will try to upload the mentioned script at some point, when there is next time need for it.

#32 Updated by Rossen Apostolov over 6 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF