It’s very handy to be able to pop open a shell and peek in your csv files. awk
is a command that will do just that — it divides each line into fields based either on a whitespace separator or by a separator specified by -F. Here’s a script that prints the 8th data column, sorts it, and prints out the unique values. It runs in a couple seconds on a 10MM row csv file.
1 2 3 4 5 6 7 8 9 |
|
If you prefer uniq
to spit out the counts, it will do so with the -c
argument:
1 2 3 4 5 6 7 8 |
|
For the record, the csv file in question looks like this:
1 2 3 4 5 6 7 8 |
|
Equivalent code in R would be something like
1 2 3 4 5 6 7 |
|