Unfortunately, statistics and machine learning seem to degenerate into a giant mess of getting data from multiple sources, munging it together, transforming it, and formatting the output, even before you can get to the work proper. A common problem is taking tab separate value (tsv) files, perhaps produced as the output of a mysql or postgres query, and turning them into comma separated value (csv) files.
Here’s one method, using sed and pretty standard regexp syntax:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
The key bit above is this: "s/\t/,/g"
. That says turn every tab (\t) into a comma (,). If you instead preferred to just remove tabs from the file period, you could use sed on "s/\t//g"
.
So, yet another thing I learned today: the version of sed
that ships with MacOS, even through 10.5.7, doesn’t support special character sequences. If the above isn’t working for you, and instead is just replacing every t character in the file with a comma, then try this:
1
|
|
Note that to type those tabs, you’ll have to hit ctrl-v (^V). If the output isn’t ",a,"
, then you have to type literal tabs in your sed
command. The \t
works under reasonable versions of linux; you’ll have to use literal tabs under OS X. Bleh.