-
Notifications
You must be signed in to change notification settings - Fork 178
Description
In looking over several examples, I've come around to the idea that we should strongly encourage re-use of the same datasets across language implementations (i.e for a specific page). Advantages include:
- Common input and output for direct comparisons.
- Avoids duplicate typing up of the task (we can write that once under the implementation header) and unnecessary in-code commenting (which, frankly, I think we have too much of atm and personally find quite distracting).
I recently did this for the Collapse a Dataset page and, personally, think it's a lot easier to read and compare the code examples now. @vincentarelbundock's Rdatasets is a very useful resource in this regard, since it provides a ton of datasets that can be directly read in as CSVs. (Both Julia and Python statsmodels have nice wrappers for it too.)
Question: Do others agree? If so, I'll add some text to Contributing page laying this out.
PS. I also think we should discourage use of really large files, especially since this is going to start becoming a drag on our GA Actions builds. There is one big offender here that I'll try to swap out when I get a sec. (Sorry, that's my student and I should have warned her about it.)