Unfortunately, statistics and machine learning seem to degenerate into a giant mess of getting data from multiple sources, munging it together, transforming it, and formatting the output, even before you can get to the work proper. A common problem is taking tab separate value (tsv) files, perhaps produced as the output of a mysql or postgres query, and turning them into comma separated value (csv) files.
Here’s one method, using sed and pretty standard regexp syntax:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
The key bit above is this:
"s/\t/,/g". That says turn every tab (\t) into a comma (,). If you instead preferred to just remove tabs from the file period, you could use sed on
So, yet another thing I learned today: the version of
sed that ships with MacOS, even through 10.5.7, doesn’t support special character sequences. If the above isn’t working for you, and instead is just replacing every t character in the file with a comma, then try this:
Note that to type those tabs, you’ll have to hit ctrl-v (^V). If the output isn’t
",a,", then you have to type literal tabs in your
sed command. The
\t works under reasonable versions of linux; you’ll have to use literal tabs under OS X. Bleh.