# Splitting Audio With Ffmpeg

Here’s a quick utility to use a set list and ffmpeg to split single audio files into multiple tracks. It splits audio files via a setlist then sets the song name, artist, album id3 tags. The script is crude, but it’s a quick start.

I used this to split a couple concerts by two of my favorite artists: James McMurtry in Concert July 14 2013 and Ray Wylie Hubbard in Nashville, TN performing tracks from A. Enlightenment B. Endarkenment (Hint: There is no C). You can find the set lists below.

You’ll have to adjust params at the top of splitter.py:

• setfile is the set file
• mp3file is the audio file
• outdir is the output directory (you should probably mkdir this beforehand)
• meta_src is the source
• do_copy should be True if your source is an mp3 file and False if you want to transcode to mp3
• replace artist and album in metas

Then just run it: python splitter.py.

# Vi Swapping Order of Tables of Counts and Labels

A note to myself: if you have a table of counts and labels, perhaps created by something | something | uniq -c, you can swap the order of the labels and the counts / change the order of columns in vi via the following regex.

highlight in visual mode with V then run the following regex: s/$$\d\+$$\s\+$$[a-zA-Z0-9_]*$$/\2 \1/

or turn them into the correct format for a python dict via s/$$\d\+$$\s\+$$[a-zA-Z0-9_]*$$/'\2': \1,/

# Shell Utilities for Data Analysis

Quick utilities to help with data analysis from the shell:

Print numbered column names of a csv or tsv. You can specify a file or it will read from stdin. It will also guess the separator, whichever of tab or comma is more common; or you may specify with --separator. This is particularly useful if you want to use awk to select columns.

Example one:

or head -1 data.csv | colnum or colnum --separator , data.csv etc.

There are two options: --separator forces a separator, and --python_dict prints a zero-index based lookup dict like so:

# Fastmail Thoughts

last updated Saturday 20 June 2015

I recently switched to fastmail in lieu of gmail, mostly because I increasingly dislike google’s stance on privacy, their integration between products, and their ongoing updates to gmail. I unfortunately updated gmail on my phone, and their new material design ethos was designed by an idiot who thinks that they should have whitespace everywhere, wasting tons of space already in short supply. I now can only see 5.5 messages in the inbox view, whereas I used to be able to see 8, an incredibly annoying change in the most important screen. So I switched.

A review of fastmail a several months in:

tl;dr: gmail is a better web application, and a better android application. Choose fastmail if you value privacy; choose gmail otherwise.

## Positives

• it’s not gmail
• privacy
• gmail shrunk the view window on android for some stupid flat design rationale; they appear to assume everyone reads email on a 6 inch phone

## Negatives

• fastmail pretends to be a gmail style email client where the unit of manipulation is a conversation, not a message. But the underlying message orientation peeks through in many cases.
• When deleting a conversation, it has repeatedly asked if I want to delete the entire conversation (what else would I want?) and had a Yes/No for don’t ask me again. I’ve clicked “don’t ask again” at least 3 times. It doesn’t take.
• if you archive a conversation, the sent emails also move to archive out of sent. This is wrong.
• Settings feels like my first javascript project.
• routing rules have to be very simple and sometimes don’t work.
• The UI for setting up routing rules is shit; you have to add them, click, add, then scroll to the top of a very long page and click “apply all changes” for the rules to take (yes, I missed that while porting rules from my old webmail and had to redo 40+ rules). It’s essentially two-phase commit ala git; not at all what I expected for a webmail ui.
• The rules don’t work as you would expect: eg messages from “a@b.com” do not match “sender ends with” “b.com”.
• Rules can only filter on one thing at a time — no compound rules on eg sender and subject. When you create a rule, it doesn’t offer to apply to existing messages in the inbox.
• Rules can’t use “or”. So if you filter on receiver, you can’t say a@b.com or b@b.com or c@b.com. Instead, you have to have one rule per each. By the time you have 100+ of these, it’s damn annoying.
• spam filtering is crappy:
• When you mark something as not spam, it is delivered to your inbox and skips rules.
• there’s no ability to sort by spam score. Hopefully the most likely nonspam would have the lowest score, so it would be convenient to sort by that to find nonspam.
• the spam filter doesn’t learn: I’ve had to mark a loan payment confirmation email as not-spam every single month I’ve used fastmail
• It sometimes loses the send button while composing messages.
• No option to “filter emails like this”; instead, you have to copy and paste eg the address you want to filter into a screen 3 clicks away.
• By default, it doesn’t load images in html email. There is a link that tries to load the images in the email you’re viewing; it works perhaps 2/3 of the time.
• The rich editor is crap.
• For just one of a long list: paste tsv data in there; it strips all the tabs. Awesome. So a b c pastes as abc. Wat?
• the mobile site on firefox lags typing like 10+ seconds if you have a quoted reply in the message box. It’s strictly amateur hour.
• The application disables access to files on mobile phones even after using requesting desktop in firefox. surprise!

• They attempt to monetize security in an incredibly stupid way. If you setup two factor authentication to text a code to your phone, they charge for the sms messages — 0.12 each! — even on $40/year accounts. That’s just chintzy. Better yet, because they’re run by cheap dicks, purchased sms credits expire after a year!!! When I saw that it felt like purchasing a prepaid cellphone at a gas station level cheap. They’re seriously pricing at 1600 hundred times the pricing twilio has on their web page for joe-random-user, not even considering volume discounts. Monetizing security makes you an asshole. • Their calendar implementation doesn’t understand meeting requests from Outlook. For example, I got a meeting request for 3pm PDT (sent as 2200 Greenwich; see excerpt from the calendar invite below) that Fastmail interpreted as 2pm PDT / 10pm BST. What on earth? Exchange is only the most common professional calendar server; why would you assume fastmail interoperates with Outlook? In summary, there’s just a lot of annoyances that make me assume the devs don’t use their own product or they’d fix it out of sheer annoyance. But they don’t sell your information, or decide to shrink the number of messages viewable in your inbox in order to conform to some stupid corporate design ethos. # Regression Questions: Logistic Regression Probabilities Assume we have a logistic regression of the form$\beta_0 + \beta_1 x$, and for value$x_0$we predict success probability$p(x_0)$. Which of the following is correct? a)$p(x_0 + 1) = B_1 + p(x_0)$b)$\frac{ p(x_0 + 1) }{ 1 – p(x_0 + 1) } = \exp(\beta_1) \frac{p(x_0) }{1 – p(x_0)}$c)$p(x_0 + 1) = \Phi(B_0 + B_1( x_0 + 1))$Assume we run a logistic regression on the 1-dimensional data below. What happens? a)$– \infty < B_0 < \infty; B_1 \rightarrow \infty$b)$\beta_0 = 0$,$\beta_1 = 0$c)$\beta_0 = 0$;$\beta_1 \rightarrow –\infty$d) none of the above # Regression Questions: A Coin Teaser This is a straightforward question that elucidates whether you understand regression, particularly the ceteris paribus interpretation of multiple regression. • let$Y$be the total value of change in your pocket; • let$X_1$be the total number of coins; • let$X_2$be the total number of pennies, nickels, and dimes. Now, regress$Y$on$X_1$or$Y$on$X_2$alone. Both$\beta_1$and$\beta_2$would be positive. If you regress$Y$on$X_1 + X_2$, what are the signs of$\beta_1$and$\beta_2$? Consider holding$X_2$constant: if$X_1$increases by 1, ie you turn a penny, nickle, or dime into a quarter, then$Y$surely increases. Therefore$\beta_1$is positive. Now consider holding$X_1$constant and increasing$X_2$. If the number of pennies, nickles, and dimes increases while the total number of coins stays constant, you’re replacing quarters with a lower valued coin. Thus increasing$X_2$can decrease$Y$, so it is entirely possible that$\beta_2$is negative. Updated 26 August 2015. # Interview Questions in R Previously, I wrote about a common interview question: given an array of words, output them in decreasing frequency order, and I provided solutions in java, java8, and python. Here’s the reason I love R: this can be accomplished in 3 lines of code. produces # Java8 Improvements java8 has a bunch of nice improvements, and over the holidays I’ve had time to play with them a bit. First, say goodbye to requiring Apache Commons for really simple functionality, like joining a string! java8 also massively cleans up some common operations. A common interview question is given an array or list of words, print them in descending order by count, or return the top n sorted by count descending. A standard program to do this may go like this: create a map from string to count; reverse the map to go from count to array of words with that count, then descend to the correct depth. The dummy data provided has these counts: this will produce output like: Using java8 streams, we can clean up much of this. For starters, creating the map from word –> word count is essentially build in. Java8 also directly supports inverting or reversing a map, replacing the need to either do it by hand or use guava’s bi-directional map. In the common case, where values are unique, this will suffice: Unfortunately, in my case that throws an exception because there is more than one word with the same count. So it’s slightly more complicated: But I really want a treemap, so I can iterate over they keys in order. Fortunately, I can specify which type of map I want it’s worth noting the python is simpler still… # Probability Problems Coin Flips 01 You have an urn with 10 coins in it: 9 fair, and one that is heads only. You draw a coin at random from the urn, then flip it 5 times. What is the probability that you get a head on the 6th flip given you observed head on each of the first 5 flips? Let$H_i$be the event we observe head on the$i$th flip, and let$C_i$be the event we draw the$i$th coin,$i = 1,…,10. Then we wish to calculate (using range syntax for brevity) $$P(H_6 | H_1 H_2 H_3 H_4 H_5) = P(H_6 | H_{1:5})$$ Conditioning on which coin we drew, and exploiting the symmetry between coins 1 to 9: \begin{align} P(H_6 | H_{1:5}) & = \sum_{i=1}^{10} P(H_6 | H_{1:5}, C_{i}) P(C_i | H_{1:5} ) \\ & = 9 \cdot P(H_6 | H_{1:5}, C_1) P(C_1 | H_{1:5}) + P(H_6 | H_{1:5}, C_{10}) P(C_{10} | H_{1:6} ) \end{align} So it just remains to calculateP(C_i | H_{1:5}). This can be done via bayes rule: $$P(C_i | H_{1:5}) = \frac{ P(H_{1:5} | C_i ) P(C_i) }{ P(H_{1:5}) }$$ where, playing the same conditioning trick: \begin{align} P(H_{1:5}) &= \sum_{i=1}^{10} P(H_{1:5} | C_i ) P(C_i) \\ & = \sum_{i=1}^{9}P(H_{1:5} | C_i) P(C_i) + P(H_{1:5} | C_{10}) P(C_{10}) \\ & = 9 \cdot \left( \frac{1}{2} \right)^5 \frac{1}{10} + 1^5 \frac{1}{10} \end{align} Thus: \begin{align} P(C_1 | H_{1:5}) & = \frac{ P(H_{1:5} | C_1 ) P(C_1) }{ 9 \cdot \left( \frac{1}{2} \right)^5 \frac{1}{10} + 1^5 \frac{1}{10} } \\ & = \frac{ \left( \frac{1}{2} \right)^5 \frac{1}{10} }{ 9 \cdot \left( \frac{1}{2} \right)^5 \frac{1}{10} + 1^5 \frac{1}{10} } \\ & = \frac{1}{9 + 2^5} \\ & = \frac{1}{41} \\ & \\ P(C_{10} | H_{1:5}) & = \frac{ P(H_{1:5} | C_{10} ) P(C_{10}) }{ 9 \cdot \left( \frac{1}{2} \right)^5 \frac{1}{10} + 1^5 \frac{1}{10} } \\ & = \frac{ 1^5 \frac{1}{10} }{ 9 \cdot \left( \frac{1}{2} \right)^5 \frac{1}{10} + 1^5 \frac{1}{10} } \\ & = \frac{32}{9 + 32} \\ & = \frac{32}{41} \\ \end{align} Note that we can quickly self-test and verify \sum_{i=1}^{10} P(C_i) = 1 \$.

Returning to eqn (2)

\begin{align} P(H_6 | H_{1:5}) & = 9 \cdot P(H_6 | H_{1:5}, C_1) P(C_1 | H_{1:5}) + P(H_6 | H_{1:5}, C_{10}) P(C_{10} | H_{1:6} ) \\ & = 9 \cdot \frac{1}{2} \frac{1}{41} + 1 \cdot \frac{32}{41} \\ & = \frac{73}{82} \end{align}

Alternatively, you can use R to calculate the probability via brute force by repeatedly sampling according to our problem and counting the number of heads observed.

my sample run produced

# C++ and Const. Sigh.

well known benefits of constness.

what on earth? It turns out the winner is this beautiful bit of syntax:

beautiful.

So for all future googlers, this is how you declare const double arrays or const multidimensional arrays in c++.