Skip to content

Conversation

@rabauke
Copy link

@rabauke rabauke commented Apr 14, 2024

Given the typical storage capacities of recent hardware, dieharder should support reading PRNs from files with more than 2^31 numbers.

}
if(strncmp(splitbuf[0],"count",5) == 0){
state->flen = atoi(splitbuf[1]);
state->flen = atoll(splitbuf[1]);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main change. count might be larger than INT_MAX, thus atoll is preferred.

state->rptr++;
state->rtot++;
if(verbose){
fprintf(stdout,"# file_input() %lu: %lu/%lu -> %u\n", state->rtot, state->rptr,state->flen,(uint)iret);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the original off_t is an alias for the signed type long int. Furthermore, one can never know what off_t actually is. Could be any signed 64-bit-integer, e.g., long int or long long int on Linux on x64, which are distinct types with distinct format strings.

FILE *fp;
off_t flen;
off_t rptr;
off_t rtot;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no good reason for using off_t. No POSIX api is used, which requires off_t.

@eddelbuettel
Copy link
Owner

While we're at it, should we modernise this use of long vs long long and pin it down with int64_t ?

and improve error handling in parameter parsing
@rabauke
Copy link
Author

rabauke commented Apr 14, 2024

@eddelbuettel I updated the PR incorporating your suggestion.

Copy link
Owner

@eddelbuettel eddelbuettel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a nice enhancement!

@eddelbuettel
Copy link
Owner

@mbroz Before I merge it, do you mind having a look at it too?

@mbroz
Copy link
Contributor

mbroz commented Apr 15, 2024

Well, I am afraid this will not be so simple. First, this is the ABI/API libdieharder break, which modifies exported prototypes in libdieharder.h (I guess the Debian build will scream for this).
And why not use uint64_t here? Also, IIRC that sscanf does not check for some conversion errors (we usually use strtoull() and check errno for ERANGE, but that's detail.)

Anyway, this will probably not be a simple place where a 32-bit count is used...
(We fight with side effects with various "simple" changes for 64-bit offset for years in storage applications, so be very careful here.)
I can check it, but not until next week, if it can wait.

@rabauke
Copy link
Author

rabauke commented Apr 15, 2024

If ABI/API stability is a concern, I could revise the change to use off_t again. The main issue with off_t is that it has been introduced for a rather specific use-case. It's not a general purpose integer type and there is no sane way to convert a string directly into an off_t. atoi (which is currently used) is certainly not a good choice.

I was not aware that the behaviour of sscanf is undefined if the result of the conversion cannot be represented in the object (e.g. integer overflow). I agree, strto* should be used instead of ato* or *scanf.

@mbroz
Copy link
Contributor

mbroz commented Apr 29, 2024

I think most of this patch is actually not needed.

dieharder.h alteady contains line
#define _FILE_OFFSET_BITS 64
that makes off_t type 64bit on all platforms.

Perhaps we can add AC_SYS_LARGEFILE macro to configure.ac that will switch off_t to 64bit in config.h globally.

But even without that, I see sizeof(off_t) 8-bytes (64bits) even on 32bit platforms. So why we need to redefine off_t to different type? Perhaps only CLI parsing fix is needed?

@eddelbuettel
Copy link
Owner

dieharder is clearly a child of another time and period, and 32-bit systems are mostly gone now.

And I generally like both to simplify, and to keep changes small and limited. @rabauke raised a valid point: we can / could / should support larger files. I still see as a bit of edge case. Maybe we can try to address that more narrowly?

@mbroz
Copy link
Contributor

mbroz commented Apr 29, 2024

I think that redefinition of _FILE_OFFSET_BITS already switches everything to large file support. Compilation on 32bit produces some warnings (some of them will be off_t in printf for sure), but these should be easily fixed. I am not sure anyone uses dieharder on 32bit systems, but with large file support it should work identically to 64bit systems.

So the only problem here is to properly handle 64bit off_t in terminal input/output. Or do I miss anything here?

}
}
if(strncmp(splitbuf[0],"numbit",6) == 0){
filenumbits = atoi(splitbuf[1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more comment - filenumbits is int, so atoi is enough here. Just it should check for 0 as error (0 does not make sense anyway here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants