Non-zero padded exponent in float string representation after Ryu implementation

After the implementation of the Ryu algorithm for floating point to string conversation (#14084) released on v1.11, the exponent in the scientific notation is no longer zero-padded if it’s a single digit:

# before
printf("%E", 123.45) # => 1.234500E+02
printf("%E", 123.45e15) # => 1.234500E+17
# after
printf("%E", 123.45) # => 1.234500E+2
printf("%E", 123.45e15) # => 1.234500E+17

Is this intended behavior?, because the official and other implementations of Ryu print zero-padded exponents. Additionally, the few existing specs with zero-padded exponents prior to the change were not ported. This change may impact negatively on a library for computational chemistry that I have been working on for many years, where several file formats use the scientific notation with zero-padded exponents in all examples I know. I cannot estimate the true impact of such change as such files are meant to be read by external (sometimes old) analysis and visualization software.

Is there a way to recover the old behavior with some configuration?

Thanks.

1 Like

I would assume this change was probably unintended. It slipped through.

There’s probably no good reason to break the previous behaviour here, especially when apparently everyone else does it that way as well.

1 Like

This was chosen to be consistent with normal float printing, which too didn’t have the extra zero before Dragonbox was ported:

1e-6         # => 1.0e-6
1e-6.to_s    # => "1.0e-6"
1e-6.inspect # => "1.0e-6"

C scanf and strtof do not require the extra zero. Do you have any concrete example of external software that requires it?

I understand that, but I cannot ensure that all software/libraries uses those or similar functions. Computational chemistry software is very heterogeneous, using from Fortran, C/C++ to Python, where many libraries implement their own parsers for speed. As I said, most languages follows the C99 convention with leading zeros in the exponent. It allows nicely aligned numbers, which is useful in my case where these files encode 3D data, thus containing hundreds/thousands of lines. A similar situation was discussed at length in this issue regarding the Toml language.

In any case, the output that Crystal now produces is different than other languages/Ryu implementations, but it should follow the standard. I don’t think that internal consistency with normal float printing if enough of an argument to break the expected scientific notation output. Such difference exists in many languages, where some even print 1e-6 as 0.000001.

If you agree, I’ll open an issue to ask for this change.

The TOML issue is precisely the opposite direction: it accepted e-6, but not e-06. If that issue never got resolved then that’s more the reason we should drop the extra zero.

What I want is evidence that some parser (not stringifer) accepts e-06 but not e-6.

You missed my point about citing that issue. The discussion was focused on that leading zeros in the exponent is a standard that most languages adhere to, and, as I said, it leads to nicely aligned numbers, which is important to my particular case. You also are incorrect because the issue was indeed resolved by PR #676 as of Aug 22, 2019.

As you suggested, most software will probably be able to read scientific notation without leading zeros, but I cannot ensure that. Ultimately, the files produce by my software written in Crystal will be different than all other software because of, IMHO, an arbitrary change in the formatting without any discussion nor notice AFAIK (I couldn’t find any comment regarding this in the related issues and PRs).

I cannot provide a concrete example of what you ask because, honestly, I don’t have the time to test every software besides the few I regularly use.

I ask you to reconsider to revert to the previous format as, again, I don’t think that consistency with normal float printing is a compelling reason. I’d go as far as to suggest that the latter should comply with the standard scientific notation, but that’s another topic.

3 Likes

I think that’s the main issue here. #14084 was merly a change of implementation to be native in Crystal. This entailed a change of the printing algorithm, which of course can result in some changes of precision and representation.
But it was not clear that this would also introduce a change in the output format which is effectively independent of the printing algorithm. If it was unavoidable, we could accept such a change. But it is avoidable and unrelated to the change of algorithm.
If we wanted to change this, we should have a discussion about it, considering the consequences. But it wasn’t even obvious that this change took place. There’s no description about it, so it’s undocumented. And in my opinion, unintended.

So I think we should treat this as a regression and revert the format back to leading zero.

3 Likes

Thank you @straight-shoota for agreeing in reverting this unintended change. I’ll open an issue about it.