Project

General

Profile

Feature #2570

Better string formatting and printing

Added by Joe Jordan 2 months ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Difficulty:
uncategorized
Close

Description

The transition to c++ means that there is an unholy mix of std::sting and char floating around the codebase. This means there is a lot of converting between types; currently there are >500 instances of c_str(). The current potential solutions leave some things still desired e.g.:
IOStream lacks formated output and needs an unaesthetic number of chevrons.
gmx::formatString is somewhat useful here but still needs c_str() conversions.

A potential solution (h/t Roland S.) is http://fmtlib.net/latest/index.html which has python-like positional formatted printing of arbitrary types. I don't know that we want to make gromacs dependent on the fmt library but it is worth discussing whether a similar functionality should exist in gromacs.

Associated revisions

Revision 4c266b29 (diff)
Added by Roland Schulz 2 months ago

Warn for type mismatch for gmx printf like functions 1/3

Automatic fixes with clang fix-it

Related #2570

Change-Id: Id565d2705c673d1e45318bb8bf1091c165a724de

Revision 2363a20f (diff)
Added by Joe Jordan 30 days ago

change some char to string in gmxpreprocess

Related to #2566 and #2570. As part of the refactoring of
pdb2gmx one thing that becomes clear is that a lot of variables
could have better defined scope and readability could be
improved if some char's are converted to string's. This can
ripple through the code base of gmxpreprocess. I have followed
some of those ripples here and will continue to follow in
later changes.

Change-Id: I0d12704ef27c06a4f64f8d1e8638760c8c7019ae

Revision 88754544 (diff)
Added by Roland Schulz 27 days ago

Enable more clang-tidy checks for new code

Some of the clang-tidy checks would require too many code
changes when enabled globally. To get the benefits from
the checks without changing all legacy code add a second
configuration. Which configuration is used is selected by
directory: Any new module should use the new configuration.

Additionally it is recommended to use the new configuration
on a per file or change basis (clang-tidy-diff.py) manually.

This change contains the proposed configuration for new code and
and enables it for three folders which contain new code.

Checks enabled relative to base configuration:
clang-analyzer-security.insecureAPI.strcpy
readability-inconsistent-declaration-parameter-name
readability-function-size
cppcoreguidelines-owning-memory
cppcoreguidelines-no-malloc
cppcoreguidelines-pro-type-member-init
cppcoreguidelines-pro-type-union-access

The reasons that the remaining checks are still disabled (don't
know a good place to document because json doesn't allow comments):
misc-incorrect-roundings: #2562
readability-else-after-return: I don't think it is something we want to follow
cppcoreguidelines-pro-type-*cast: While it is best to not do those cast there
are valid reasons for them. And because they are already easy to spot,
mandating a NOLINT for those is probably not helpful.
cppcoreguidelines-special-member-functions: conflicts with Wunused-member
cppcoreguidelines-pro-type-vararg: #2570
cppcoreguidelines-pro-bounds-constant-array-index: While I think it would be
nice to have a compile mode with bound checking enabled. It is so ugly
to have to write at(v, n) instead of v[n].
cppcoreguidelines-pro-bounds-array-to-pointer-decay: This would be particular
nice to have But it makes it very hard to write warning free code with
legacy APIs which uses e.g. rvec. Those have to be modernized first.
cppcoreguidelines-pro-bounds-pointer-arithmetic: This would also be very nice.
But it requires that depending APIs use ArrayRef rather than pointers
for non-owning array passing.

Change-Id: I891a576d2c185ef6587224a1a19324f1a8967237

Revision 6eb662a1 (diff)
Added by Roland Schulz 10 days ago

Warn for type mismatch for gmx printf like functions 2/3

Most manual fixes. Inclduing those with missing/extra
arguments.

Related #2570

Change-Id: I2046d323dad6420e61f47918b12aa538f4a6d4a0

Revision e5a941a4 (diff)
Added by Roland Schulz 9 days ago

Warn for type mismatch for gmx printf like functions 3/3

Add attributes for gcc (and compatible) and msvc.
And fix remaining issues.

Related #2570

Change-Id: I9b46559c2554404b309aa8710eb34ffa128d8cae

History

#1 Updated by Mark Abraham 2 months ago

There's a few related issues entangled here.

These kind outputs go to a stream, which should be a low level component used by some higher level writer that understands stuff like lines, indenting, wrapping and formatting. That permits the stream to be swapped out for testing, or do nothing if the kind of output should not be emitted at this verbosity level. Teemu made a lot of this already (see http://manual.gromacs.org/documentation/current/doxygen/html-full/classgmx_1_1TextWriter.xhtml and related TextReader and various text and file stream implementaionts). There's an example use case in write_inpfile() of src/gromacs/fileio/readinp.cpp for writing out .mdp files. A useful addition would be to borrow from the implementation of formatString and add to TextWriter some methods like writeFormattedLine() and writeFormattedString(). C++11 argument forwarding makes this easier than back when Teemu first implemented formatString.

An even more useful thing would be to forward the arguments after using metaprogramming to call c_str() on any type that supports it. Who's keen to hurt their brain a bit? :-) Example at https://msdn.microsoft.com/en-us/magazine/dn913181.aspx

Avoiding other use of c_str() is often a matter of providing a convenience overload, or refactoring to no longer have a C-style consumer. However there are plenty of cases where neither of those is feasible in the scope of a current change, and one should use c_str() and maybe leave a TODO. And later coming back to resolve them some time ;-) Some good news is that there is tooling that can find that a c_str() is being used to construct a temporary std::string to pass to a call that now uses std::string rather than const char *.

Type safe formatting is a nice thing to have, but it's a stretch goal. Until the string output of some area of the code is under test (e.g. by using the above approaches), it isn't be a priority to replace the formatting by something else (even if the library dependency was 100% desirable) because there's a risk of inadvertently changing the usability of the resulting output. I agree there's good things that would flow from such formatting, but there are simpler ways to get bigger impact for now.

#2 Updated by Roland Schulz 2 months ago

TextWriter and formatString are orthogonal. If we had a better formatString that could be used by TextWriter.

Writing a formatString which is better requires a lot of code. As you say we would need overloading. And va_args doesn't work with overloading and is anyhow very ugly (core-guidelines says you shoulnd't use). It should be a variadic template. The problem is that there is no built-in function to just replace the first conversion "%..." of the fmt string. But that is required when replacing va_args because you need to iterate (or recurse) over the arguments. Thus you would end having to implement your own conversion function. And in that case you would end up with almost as much code as if you just take the one header from fmt which does that (that code isn't big). And if you anyhow have a type-aware conversion function (which you would automatically get by using a variadic template) there is no reason not to make it typesafe.

There is no need to change the print format. fmt supports printf style as legacy format: http://fmtlib.net/latest/api.html#printf-api . Whether we decide to stick to the legacy fmt even for new code (to have consistent format) or use the new format for new code (to avoid specifying information which is redundant for a typesafe printing) is a separate question.

#3 Updated by Mark Abraham 2 months ago

Roland Schulz wrote:

TextWriter and formatString are orthogonal. If we had a better formatString that could be used by TextWriter.

Writing a formatString which is better requires a lot of code. As you say we would need overloading. And va_args doesn't work with overloading and is anyhow very ugly (core-guidelines says you shoulnd't use). It should be a variadic template. The problem is that there is no built-in function to just replace the first conversion "%..." of the fmt string. But that is required when replacing va_args because you need to iterate (or recurse) over the arguments. Thus you would end having to implement your own conversion function. And in that case you would end up with almost as much code as if you just take the one header from fmt which does that (that code isn't big). And if you anyhow have a type-aware conversion function (which you would automatically get by using a variadic template) there is no reason not to make it typesafe.

I think we're talking about different things. I'm proposing a variadic template function that recognizes any std::string in the template parameter pack and uses c_str() to convert any such arg, which then calls something like formatString() with the resulting function parameter pack. That doesn't touch the format string or va_args, and sounds to me like a much lighter weight way to let reduce developers frequency of calling c_str().

#4 Updated by Roland Schulz 2 months ago

I agree. I misunderstood you and your solution should be easy to do.
In gerrit, I was talking mostly about the problem that forgetting .cstr() isn't a compiler error, rather than the very minor issue of having to type 7 more characters. And of course this type safety issue is something which doesn't just apply to string. But of course there would be an advantage to at least avoid the string usage error without fully resolving the type safety issue. If we don't do a proper type safe solution we should at least add format function attribute to formatString.

#5 Updated by Roland Schulz 2 months ago

const char* conv(const char *s) { return s;}
const char* conv(const std::string &s) { return s.c_str();}

//template<typename ...Args>
//std::string formatString(const char*, Args...) __attribute__ ((format (printf, 1, 2)));

template<typename ...Args>
std::string formatString(const char* fmt, Args...args)
{
    return formatStringImpl(fmt, conv(args)...);
}

This works. But only without the attribute because the attribute is only valid for variadic functions not variadic templates. I think the warning is more important than the convenience.

#6 Updated by Gerrit Code Review Bot 2 months ago

Gerrit received a related patchset '1' for Issue #2570.
Uploader: Roland Schulz ()
Change-Id: gromacs~master~Ie2bc5181a7fee1c0d1a647a12d8c73c4fb3ebe96
Gerrit URL: https://gerrit.gromacs.org/8080

#7 Updated by Gerrit Code Review Bot 2 months ago

Gerrit received a related patchset '1' for Issue #2570.
Uploader: Roland Schulz ()
Change-Id: gromacs~master~Id565d2705c673d1e45318bb8bf1091c165a724de
Gerrit URL: https://gerrit.gromacs.org/8082

#8 Updated by Gerrit Code Review Bot 2 months ago

Gerrit received a related patchset '1' for Issue #2570.
Uploader: Roland Schulz ()
Change-Id: gromacs~master~I2046d323dad6420e61f47918b12aa538f4a6d4a0
Gerrit URL: https://gerrit.gromacs.org/8083

#9 Updated by Gerrit Code Review Bot 2 months ago

Gerrit received a related patchset '1' for Issue #2570.
Uploader: Roland Schulz ()
Change-Id: gromacs~master~I9b46559c2554404b309aa8710eb34ffa128d8cae
Gerrit URL: https://gerrit.gromacs.org/8087

#10 Updated by Roland Schulz 2 months ago

absl also has a type-safe fmt library: https://abseil.io/docs/cpp/guides/format
Uses standard printf syntax. Ignoring the size modifier and supporting string/string_view.
And absl has a lot of other things which would improve code quality a lot (span, fixed_array, optional, variant).

#11 Updated by Roland Schulz 2 months ago

For compile time format string checking absl requires clang's enable_if attribute and fmt requires full c++14 constexpr support(clang, msvc, gcc >=6). While neither requirement covers all compilers, as long some Jenkins configuration have compile time checking enabled, it should be sufficient.

#12 Updated by Gerrit Code Review Bot about 2 months ago

Gerrit received a related patchset '1' for Issue #2570.
Uploader: Joe Jordan ()
Change-Id: gromacs~master~I0d12704ef27c06a4f64f8d1e8638760c8c7019ae
Gerrit URL: https://gerrit.gromacs.org/8134

#13 Updated by Gerrit Code Review Bot about 2 months ago

Gerrit received a related patchset '6' for Issue #2570.
Uploader: Roland Schulz ()
Change-Id: gromacs~master~I891a576d2c185ef6587224a1a19324f1a8967237
Gerrit URL: https://gerrit.gromacs.org/8130

Also available in: Atom PDF