To split files (eg for test / train splits or k-folds) without having to load into R
or python
, awk
will do a fine job.
For example, to crack into 16 equal parts using modulus to assign rows to files:
1
|
|
Or to crack a file into a 80/20 test/train split:
1
|
|
And finally, if your data file has a header that you don’t want to end up in a random file, you can dump the header row into both files, then tell your awk
script to append (and use tail
to skip the header row)
1 2 3 |
|