To split files (eg for test / train splits or k-folds) without having to load into
awk will do a fine job.
For example, to crack into 16 equal parts using modulus to assign rows to files:
Or to crack a file into a 80/20 test/train split:
And finally, if your data file has a header that you don’t want to end up in a random file, you can dump the header row into both files, then tell your
awk script to append (and use
tail to skip the header row)
1 2 3