ffmpeg
to split single audio files into multiple tracks. It splits audio files via a setlist then sets the song name, artist, album id3 tags. The script is crude, but it’s a quick start.
I used this to split a couple concerts by two of my favorite artists: James McMurtry in Concert July 14 2013 and Ray Wylie Hubbard in Nashville, TN performing tracks from A. Enlightenment B. Endarkenment (Hint: There is no C). You can find the set lists below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|
You’ll have to adjust params at the top of splitter.py
:
setfile
is the set filemp3file
is the audio fileoutdir
is the output directory (you should probably mkdir
this beforehand)meta_src
is the sourcedo_copy
should be True
if your source is an mp3 file and False
if you want to transcode to mp3artist
and album
in metas
Then just run it: python splitter.py
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
something | something | uniq -c
, you can swap the order of the labels and the counts / change the order of columns in vi via the following regex.
1 2 3 4 |
|
highlight in visual mode with V
then run the following regex: s/\(\d\+\)\s\+\([a-zA-Z0-9_]*\)/\2 \1/
1 2 3 4 |
|
or turn them into the correct format for a python dict via s/\(\d\+\)\s\+\([a-zA-Z0-9_]*\)/'\2': \1,/
1 2 3 4 |
|
Print numbered column names of a csv or tsv. You can specify a file or it will read from stdin. It will also guess the separator, whichever of tab or comma is more common; or you may specify with --separator
. This is particularly useful if you want to use awk to select columns.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
|
Example one:
1 2 3 4 5 6 7 8 |
|
or head -1 data.csv | colnum
or colnum --separator , data.csv
etc.
There are two options: --separator
forces a separator, and --python_dict
prints a zero-index based lookup dict like so:
1
|
|
I recently switched to fastmail in lieu of gmail, mostly because I increasingly dislike google’s stance on privacy, their integration between products, and their ongoing updates to gmail. I unfortunately updated gmail on my phone, and their new material design ethos was designed by an idiot who thinks that they should have whitespace everywhere, wasting tons of space already in short supply. I now can only see 5.5 messages in the inbox view, whereas I used to be able to see 8, an incredibly annoying change in the most important screen. So I switched.
A review of fastmail a several months in:
tl;dr: gmail is a better web application, and a better android application. Choose fastmail if you value privacy; choose gmail otherwise.
a b c
pastes as abc
. Wat?1
|
|
In summary, there’s just a lot of annoyances that make me assume the devs don’t use their own product or they’d fix it out of sheer annoyance. But they don’t sell your information, or decide to shrink the number of messages viewable in your inbox in order to conform to some stupid corporate design ethos.
]]>a) $p(x_0 + 1) = B_1 + p(x_0)$
b) $\frac{ p(x_0 + 1) }{ 1 – p(x_0 + 1) } = \exp(\beta_1) \frac{p(x_0) }{1 – p(x_0)}$
c) $p(x_0 + 1) = \Phi(B_0 + B_1( x_0 + 1))$
Assume we run a logistic regression on the 1-dimensional data below. What happens?
a) $– \infty < B_0 < \infty; B_1 \rightarrow \infty$
b) $\beta_0 = 0$, $\beta_1 = 0$
c) $\beta_0 = 0$; $\beta_1 \rightarrow –\infty$
d) none of the above
]]>Now, regress $Y$ on $X_1$ or $Y$ on $X_2$ alone. Both $\beta_1$ and $\beta_2$ would be positive.
If you regress $Y$ on $X_1 + X_2$, what are the signs of $\beta_1$ and $\beta_2$?
Consider holding $X_2$ constant: if $X_1$ increases by 1, ie you turn a penny, nickle, or dime into a quarter, then $Y$ surely increases. Therefore $\beta_1$ is positive.
Now consider holding $X_1$ constant and increasing $X_2$. If the number of pennies, nickles, and dimes increases while the total number of coins stays constant, you’re replacing quarters with a lower valued coin. Thus increasing $X_2$ can decrease $Y$, so it is entirely possible that $\beta_2$ is negative.
Updated 26 August 2015.
]]>Here’s the reason I love R: this can be accomplished in 3 lines of code.
1 2 3 |
|
produces
1 2 |
|
First, say goodbye to requiring Apache Commons for really simple functionality, like joining a string!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
java8 also massively cleans up some common operations. A common interview question is given an array or list of words, print them in descending order by count, or return the top n sorted by count descending. A standard program to do this may go like this: create a map from string to count; reverse the map to go from count to array of words with that count, then descend to the correct depth.
The dummy data provided has these counts:
1 2 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
|
this will produce output like:
1 2 3 4 5 6 7 |
|
Using java8 streams, we can clean up much of this. For starters, creating the map from word –> word count is essentially build in.
1 2 3 |
|
Java8 also directly supports inverting or reversing a map, replacing the need to either do it by hand or use guava’s bi-directional map. In the common case, where values are unique, this will suffice:
1 2 3 4 |
|
Unfortunately, in my case that throws an exception because there is more than one word with the same count. So it’s slightly more complicated:
1 2 3 |
|
But I really want a treemap, so I can iterate over they keys in order. Fortunately, I can specify which type of map I want
1 2 |
|
it’s worth noting the python is simpler still…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
Let $H_i$ be the event we observe head on the $i$th flip, and let $C_i$ be the event we draw the $i$th coin, $i = 1,…,10$.
Then we wish to calculate (using range syntax for brevity) $$( P(H_6 | H_1 H_2 H_3 H_4 H_5) = P(H_6 | H_{1:5}) $$)
Conditioning on which coin we drew, and exploiting the symmetry between coins 1 to 9:
So it just remains to calculate $P(C_i | H_{1:5})$. This can be done via bayes rule:
where, playing the same conditioning trick:
Thus:
Note that we can quickly self-test and verify $ \sum_{i=1}^{10} P(C_i) = 1 $.
Returning to eqn (2)
Alternatively, you can use R to calculate the probability via brute force by repeatedly sampling according to our problem and counting the number of heads observed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
my sample run produced
1 2 3 4 5 6 |
|
const
ness.
1 2 3 4 5 6 7 8 |
|
what on earth? It turns out the winner is this beautiful bit of syntax:
1
|
|
beautiful.
So for all future googlers, this is how you declare const double arrays or const multidimensional arrays in c++.
]]>R
or python
, awk
will do a fine job.
For example, to crack into 16 equal parts using modulus to assign rows to files:
1
|
|
Or to crack a file into a 80/20 test/train split:
1
|
|
And finally, if your data file has a header that you don’t want to end up in a random file, you can dump the header row into both files, then tell your awk
script to append (and use tail
to skip the header row)
1 2 3 |
|
An aside: php-cgi
is so fragile and crashes so often I ran a screen session as root that just attempted to restart it every 5 seconds (attached below for any poor souls stuck using this tech.)
1 2 |
|
For googlers who want to move from wordpress to octopress, here’s how I moved 70-odd posts with minimal pain.
1 – Get thomasf’s excellent python script (accurately named exitwp) that converts wordpress posts to octopress posts. This will create one octopress post per wordpress post in the source
directory.
2 – I simultaneously moved urls from blog.earlh.com
to earlh.com/blog
so I needed to 301 all the old posts. I did that by getting
this awesome wordpress post exporter script contributed by Mike Schinkel. I curled that to create a list of urls to forward, then built a tsv of pairs of old url\tnewurl
. Then the below awk script will print nginx forward rules:
1
|
|
The rules look like:
1 2 3 4 |
|
Add them to your site nginx.conf
file inside the server
configuration block.
I’ll update with solutions for better image embedding.
]]>1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
what was the cause of this monstrosity?
1 2 3 4 |
|
so yeah, you can’t copy thread objects, enforced by having a private constructor. Still, the amount of knowledge it takes to translate from the error message to the error is pretty amazing.
]]>unique
or table
, but if your data is largish it may be quite annoying to load into R. I often use bash to quickly pick out a column, ala
1
|
|
In order: bash cat
s my data, tells awk
to print just column 8 using ,
as the separator field, sort
s all the data so that I can use uniq
, asks uniq
to print the counts and then the unique strings, then sort
s by the counts descending (-n
interprets as a number and -r
sorts descending). The obvious inefficiency here is if your data is a couple of gb, you have to sort in order for uniq
to work. Instead, you can add the script below to your path and replace the above with:
1
|
|
not only is this a lot less typing, but it will be significantly faster since you don’t have to hold all the data in ram and sort it.
1 2 3 4 5 6 7 8 9 10 |
|
I will miss comments, but I hope people will email instead. That said, of the nearly 20,000 comments my site has received I believe fewer than thirty weren’t spam. In fact, wordpress has a whole cottage industry selling a comment spam control tool called Akismet created to fix how easy wordpress makes comment spam.
]]>There’s only one company that (should) have ever seen the highlighted email address. It’s also not a common word that you would find in a dictionary attack.
]]>mapred-site.xml
and add
1
|
|
To push the changes to all the machines, use the script to modify mapper or reducer count on a running emr cluster.
]]>1 2 |
|
as appropriate to the elastic-mapreduce.rb command.
For a running emr cluster, you can use the following scripts. Navigate to the conf directory; it will be in a path similar to /home/hadoop/.versions/1.0.3/conf
Edit mapred-site.xml
and replace either or both of
1
|
|
or
1
|
|
Then copy and paste these commands:
1 2 3 4 5 6 7 8 |
|
One way to verify this worked is on the jobtracker web page.
]]>1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Jason Aten was kind enough to fix this for Snow Leopard and later, as detailed in the lush mailing list archive. Grab Jason’s lush2 git repo from github.
]]>