Here’s a quick utility to use a set list and ffmpeg to split single audio files into multiple tracks. It splits audio files via a setlist then sets the song name, artist, album id3 tags. The script is crude, but it’s a quick start.
I used this to split a couple concerts by two of my favorite artists: James McMurtry in Concert July 14 2013 and Ray Wylie Hubbard in Nashville, TN performing tracks from A. Enlightenment B. Endarkenment (Hint: There is no C). You can find the set lists below.
# use ffmpeg to split an mp3 according to a set listfrom__future__importprint_functionimportsubprocessassp# NB: add -copy if from mp3 -> mp3; else no -copysetfile='./Ray Wylie Hubbard in Concert - December 17, 2010 at Tennessee State Museum.set'mp3file='./Ray Wylie Hubbard full concert - DASH.m4a'outdir='./Ray Wylie Hubbard in Concert - December 17, 2010 at Tennessee State Museum'meta_src='https://www.youtube.com/watch?v=y9_xBIuV9nE'do_copy=Falseprint('splitting \'%s\' according to sets in \'%s\' into directory \'%s\''%(mp3file,setfile,outdir))metas={'artist':'Ray Wylie Hubbard','album':'Acoustic Live in Concert December 17 2010 at Tennessee State Museum / Hippie Jacks'}metastring=' '.join(['-metadata %s="%s"'%(k,v)fork,vinmetas.iteritems()])songs=[]withopen(setfile)asf:forlineinf:start,title=line.strip().split(' ',1)songs.append((start,title))print('read %d lines'%len(songs))# set id3 tags ala http://jonhall.info/how_to/create_id3_tags_using_ffmpegcmds=[]foriinrange(len(songs)-1):cmd=(songs[i][0],songs[i+1][0],songs[i][1])cmds.append(cmd)run='ffmpeg -i \"%s\" -acodec %s -ss %s -to %s '%(mp3file,'copy'ifdo_copyelse'libmp3lame -qscale:a 2',cmd[0],cmd[1])run+='%s -metadata title="%s" -metadata track="%d/%d" -metadata publisher="%s" '%(metastring,cmd[2],i+1,len(songs)-1,meta_src)run+='"%s/%02d.mp3"'%(outdir,i+1)# print("\tcmd: %s" % run)sp.call(run,shell=True)
You’ll have to adjust params at the top of splitter.py:
setfile is the set file
mp3file is the audio file
outdir is the output directory (you should probably mkdir this beforehand)
meta_src is the source
do_copy should be True if your source is an mp3 file and False if you want to transcode to mp3
00:00:00 chirping
00:00:48 Bayou Tortous
00:05:04 Red Dress
00:11:00 talking / tuning
00:11:52 What's the Matter Now
00:16:09 talking / real sensitive little love ballads and baptists
00:17:18 How'm I Gonna Find You Now
00:21:28 talking / buy cds
00:22:09 Just Us Kids
00:27:00 applause
00:27:27 You'd A' Thought (Leonard Cohen Must Die)
00:33:34 clapping and get ready to dance
00:33:47 Choctaw Bingo (with new verse) (introduction)
00:34:30 Choctaw Bingo (with new verse)
00:46:22 applause / belly rubbing dances
00:46:49 These things i've come to know
00:50:26 talking / introduction to the band / unfortunately song still relevant
00:51:44 We Can't Make it Here Anymore
00:58:37 talking about ice fishing and Tim Holt
00:59:37 Copper Canteen
01:04:22 talking / back to stompin' songs
01:04:41 Freeway View
01:08:03 talking / feral hogs in michigan
01:08:33 Lobo Town
01:14:00 applause
01:14:07 Restless
01:19:17 talking / goes out to memory of max crawford
01:20:02 Levelland
01:26:40 talking / thanks crowd / reminds crowd to bring whiskey / be nice to the nice officer
01:28:21 Too Long in the Wasteland
01:35:36 applause
01:35:50 Lights of Cheyenne (acoustic)
01:41:51 thanks and watch for deer (god bless wisconsin)
01:42:42 finit
00:00:00 hippie jacks intro
00:03:03 talking / rwh enters
00:03:18 Count My Blessings
00:06:43 talking / glad to be here / band on weekend work release
00:07:09 Rabbit
00:10:02 talking / get what you pay for
00:10:23 Snake Farm
00:14:14 talking / Hayes Carll / smell
00:16:18 Drunken Poet's Dream
00:20:00 talking / get more attention burning down a barn
00:21:29 Down Home Country Blues
00:24:18 talking / hardly strictly bluegrass / commitment
00:26:25 The Ballad of the Crimson Kings
00:30:57 talking / Without Love banter
00:32:40 Without Love
00:36:28 tuning
00:36:43 Dust of the Chase
00:40:50 applause / talking / tuning / remembering Mambo John Traynor
00:43:30 Name Droppin'
00:46:44 talking / ain't no rock and roll roosters
00:47:30 Rooster
00:50:12 talking / gambler and a drinker's gold bullet
00:52:35 Mississippi Flush
00:56:39 talking / no name dropping / slaid cleaves / turned it over to the da
00:59:34 Conversation With the Devil
01:03:30 talking / hard core serious hillbilly redneck country bar
01:05:37 Redneck Mother
01:09:26 talking / ralph stanley and bill monroe
01:11:00 Wanna Rock and Roll
01:14:03 talking / band should get instruments and learn to play 'em
01:15:03 The Messenger
01:19:38 talking / thank you
01:20:43 Mother Hubbard's Blues
01:27:13 thanks / post show
01:29:05 finit
A note to myself: if you have a table of counts and labels, perhaps created by something | something | uniq -c, you can swap the order of the labels and the counts / change the order of columns in vi via the following regex.
1234
1 labelone
2 labeltwo
3 labelthree
99 label99
highlight in visual mode with V then run the following regex: s/\(\d\+\)\s\+\([a-zA-Z0-9_]*\)/\2 \1/
1234
labelone 1
labeltwo 2
labelthree 3
label99 99
or turn them into the correct format for a python dict via s/\(\d\+\)\s\+\([a-zA-Z0-9_]*\)/'\2': \1,/
Quick utilities to help with data analysis from the shell:
Print numbered column names of a csv or tsv. You can specify a file or it will read from stdin. It will also guess the separator, whichever of tab or comma is more common; or you may specify with --separator. This is particularly useful if you want to use awk to select columns.
#!/usr/bin/python# you use macports, you probably want the first line to be exactly #!/opt/local/bin/python# Copyright 2015 Earl Hathaway rblog Ray at Bans earlh dot com (take my sunglasses off to email me)# License: The author or authors of this code dedicate any and all copyright interest in this code to the public domain.## print numbered column names or headers from a file or stdin if present with an optional field separator# tested to work with python from 2.7 to 3.4from__future__importprint_functionimportargparseimportmathimportos.pathimportsysstdin=notsys.stdin.isatty()parser=argparse.ArgumentParser(description='print numbered column headers')parser.add_argument('file',nargs='?',help='filename(default: stdin if not a tty)')parser.add_argument('--separator',dest='separator',nargs=1,help='specify the field separator (default: whichever of comma or tab is more common)')parser.add_argument('--python_dict',dest='pydict',action="store_true",help='emit a python dict?')args=parser.parse_args(sys.argv[1:])ifargs.fileisnotNoneandnotos.path.isfile(args.file):print('File "%s" does not exist'%args.file)sys.exit(0)first=Noneifstdin:first=sys.stdin.readline()elifargs.fileisnotNone:withopen(args.file,'r')asf:first=f.readline()else:print('no file specified and nothing on stdin')parser.print_help()sys.exit(0)sep=Noneifargs.separatorisNone:n_comma=first.count(',')n_tabs=first.count('\t')sep="\t"ifn_tabs>=n_commaelse","else:sep=args.separator[0]fields=first.split(sep)# emit a python dict to copy into code; should be zero basedifargs.pydict:pydict='{'+(', '.join(['\'%s\': %d'%(val.strip(),idx)foridx,valinenumerate(fields)]))+'}'print(pydict)sys.exit(0)# calculate indentation for fields so they don't staggerwidth=0iflen(fields)<10elseint(math.ceil(math.log10(len(fields))))format=' %%%dd %%s'%widthforidx,valinenumerate(fields):print(format%(idx+1,val.strip()))
I recently switched to fastmail in lieu of gmail, mostly because I increasingly dislike google’s stance on privacy, their integration between products, and their ongoing updates to gmail. I unfortunately updated gmail on my phone, and their new material design ethos was designed by an idiot who thinks that they should have whitespace everywhere, wasting tons of space already in short supply. I now can only see 5.5 messages in the inbox view, whereas I used to be able to see 8, an incredibly annoying change in the most important screen. So I switched.
A review of fastmail a several months in:
tl;dr: gmail is a better web application, and a better android application. Choose fastmail if you value privacy; choose gmail otherwise.
Positives
it’s not gmail
privacy
gmail shrunk the view window on android for some stupid flat design rationale; they appear to assume everyone reads email on a 6 inch phone
Negatives
fastmail pretends to be a gmail style email client where the unit of manipulation is a conversation, not a message. But the underlying message orientation peeks through in many cases.
When deleting a conversation, it has repeatedly asked if I want to delete the entire conversation (what else would I want?) and had a Yes/No for don’t ask me again. I’ve clicked “don’t ask again” at least 3 times. It doesn’t take.
if you archive a conversation, the sent emails also move to archive out of sent. This is wrong.
Settings feels like my first javascript project.
routing rules have to be very simple and sometimes don’t work.
The UI for setting up routing rules is shit; you have to add them, click, add, then scroll to the top of a very long page and click “apply all changes” for the rules to take (yes, I missed that while porting rules from my old webmail and had to redo 40+ rules). It’s essentially two-phase commit ala git; not at all what I expected for a webmail ui.
The rules don’t work as you would expect: eg messages from “a@b.com” do not match “sender ends with” “b.com”.
Rules can only filter on one thing at a time — no compound rules on eg sender and subject. When you create a rule, it doesn’t offer to apply to existing messages in the inbox.
Rules can’t use “or”. So if you filter on receiver, you can’t say a@b.com or b@b.com or c@b.com. Instead, you have to have one rule per each. By the time you have 100+ of these, it’s damn annoying.
spam filtering is crappy:
When you mark something as not spam, it is delivered to your inbox and skips rules.
there’s no ability to sort by spam score. Hopefully the most likely nonspam would have the lowest score, so it would be convenient to sort by that to find nonspam.
the spam filter doesn’t learn: I’ve had to mark a loan payment confirmation email as not-spam every single month I’ve used fastmail
It sometimes loses the send button while composing messages.
No option to “filter emails like this”; instead, you have to copy and paste eg the address you want to filter into a screen 3 clicks away.
By default, it doesn’t load images in html email. There is a link that tries to load the images in the email you’re viewing; it works perhaps 2/3 of the time.
The rich editor is crap.
For just one of a long list: paste tsv data in there; it strips all the tabs. Awesome. So a b c pastes as abc. Wat?
the mobile site on firefox lags typing like 10+ seconds if you have a quoted reply in the message box. It’s strictly amateur hour.
The application disables access to files on mobile phones even after using requesting desktop in firefox. surprise!
Potential Dealbreakers:
They attempt to monetize security in an incredibly stupid way. If you setup two factor authentication to text a code to your phone, they charge for the sms messages — 0.12 each! — even on $40/year accounts. That’s just chintzy. Better yet, because they’re run by cheap dicks, purchased sms credits expire after a year!!! When I saw that it felt like purchasing a prepaid cellphone at a gas station level cheap. They’re seriously pricing at 1600 hundred times the pricing twilio has on their web page for joe-random-user, not even considering volume discounts. Monetizing security makes you an asshole.
Their calendar implementation doesn’t understand meeting requests from Outlook. For example, I got a meeting request for 3pm PDT (sent as 2200 Greenwich; see excerpt from the calendar invite below) that Fastmail interpreted as 2pm PDT / 10pm BST. What on earth? Exchange is only the most common professional calendar server; why would you assume fastmail interoperates with Outlook?
1
DTSTART;TZID=Greenwich Standard Time:20150729T220000
In summary, there’s just a lot of annoyances that make me assume the devs don’t use their own product or they’d fix it out of sheer annoyance. But they don’t sell your information, or decide to shrink the number of messages viewable in your inbox in order to conform to some stupid corporate design ethos.
Assume we have a logistic regression of the form $\beta_0 + \beta_1 x$, and for value $x_0$ we predict success probability $p(x_0)$.
Which of the following is correct?
This is a straightforward question that elucidates whether you understand regression, particularly the ceteris paribus interpretation of multiple regression.
let $Y$ be the total value of change in your pocket;
let $X_1$ be the total number of coins;
let $X_2$ be the total number of pennies, nickels, and dimes.
Now, regress $Y$ on $X_1$ or $Y$ on $X_2$ alone. Both $\beta_1$ and $\beta_2$ would be positive.
If you regress $Y$ on $X_1 + X_2$, what are the signs of $\beta_1$ and $\beta_2$?
Consider holding $X_2$ constant: if $X_1$ increases by 1, ie you turn a penny, nickle, or dime into a quarter, then $Y$ surely increases. Therefore $\beta_1$ is positive.
Now consider holding $X_1$ constant and increasing $X_2$. If the number of pennies, nickles, and dimes increases while the total number of coins stays constant, you’re replacing quarters with a lower valued coin. Thus increasing $X_2$ can decrease $Y$, so it is entirely possible that $\beta_2$ is negative.
Previously, I wrote about a common interview question: given an array of words, output them in decreasing frequency order, and I provided solutions in java, java8, and python.
Here’s the reason I love R: this can be accomplished in 3 lines of code.
java8 has a bunch of nice improvements, and over the holidays I’ve had time to play with them a bit.
First, say goodbye to requiring Apache Commons for really simple functionality, like joining a string!
12345678910111213141516171819202122
importstaticjava.util.stream.Collectors.joining;importjava.util.Arrays;importjava.util.List;/** * */publicclassStringUtils{publicstaticvoidmain(String[]args){List<String>words=Arrays.asList("a","b","a","a","b","c","a1","a1","a1");// old style print each element of a list: Arrays.toString(result.toArray())puts("java6 style %s",Arrays.toString(words.toArray()));puts("java8 style [%s]",words.stream().collect(joining(", ")));}publicstaticvoidputs(Strings){System.out.println(s);}publicstaticvoidputs(Stringformat,Object...args){puts(String.format(format,args));}}
java8 also massively cleans up some common operations. A common interview question is given an array or list of words, print them in descending order by count, or return the top n sorted by count descending. A standard program to do this may go like this: create a map from string to count; reverse the map to go from count to array of words with that count, then descend to the correct depth.
importjava.util.ArrayList;importjava.util.Arrays;importjava.util.Collections;importjava.util.HashMap;importjava.util.List;importjava.util.Map;importjava.util.TreeMap;/** * get the n highest frequency words */publicclassWordCounts{publicstaticvoidmain(String[]args){String[]words=newString[]{"a","b","a","a","b","c","a1","a1","a1"};for(intdepth=0;depth<4;depth++){List<String>result=getMostFrequentWords(words,depth);puts("depth %d -> %s",depth,Arrays.toString(result.toArray()));puts("");}}publicstaticList<String>getMostFrequentWords(String[]words,intdepth){if(words==null||words.length==0||depth<=0)returnCollections.emptyList();// word -> countsHashMap<String,Integer>counts=newHashMap<>();for(Stringword:words){if(counts.containsKey(word))counts.put(word,counts.get(word)+1);elsecounts.put(word,1);}// count -> list of words with that countTreeMap<Integer,ArrayList<String>>countmap=newTreeMap<>();for(Map.Entry<String,Integer>entry:counts.entrySet()){if(countmap.containsKey(entry.getValue()))countmap.get(entry.getValue()).add(entry.getKey());else{ArrayList<String>l=newArrayList<>();l.add(entry.getKey());countmap.put(entry.getValue(),l);}}// iterate through treemap to desired depthArrayList<String>result=newArrayList<>();while(result.size()<=depth){for(Integeri:countmap.descendingKeySet()){ArrayList<String>list=countmap.get(i);if(list.size()+result.size()<depth){result.addAll(list);}else{for(Strings:list){result.add(s);if(result.size()==depth)returnresult;}}}}returnresult;}publicstaticvoidputs(Strings){System.out.println(s);}publicstaticvoidputs(Stringformat,Object...args){puts(String.format(format,args));}}
Using java8 streams, we can clean up much of this. For starters, creating the map from word –> word count is essentially build in.
123
// word -> countsMap<String,Long>counts=Arrays.stream(words).collect(Collectors.groupingBy(s->s,Collectors.counting()));
Java8 also directly supports inverting or reversing a map, replacing the need to either do it by hand or use guava’s bi-directional map. In the common case, where values are unique, this will suffice:
1234
// count -> list of words: reverse the counts mapMap<Long,String>countmap=counts.entrySet().stream().collect(Collectors.toMap(Map.Entry::getValue,Map.Entry::getKey));puts("countmap: %s",countmap);
Unfortunately, in my case that throws an exception because there is more than one word with the same count. So it’s slightly more complicated:
123
// count -> list of words: reverse a map with duplicate values, collecting duplicates in an ArrayListMap<Long,ArrayList<String>>countmap=counts.entrySet().stream().collect(Collectors.groupingBy(Map.Entry<String,Long>::getValue,Collectors.mapping(Map.Entry<String,Long>::getKey,Collectors.toCollection(ArrayList::new))));
But I really want a treemap, so I can iterate over they keys in order. Fortunately, I can specify which type of map I want
You have an urn with 10 coins in it: 9 fair, and one that is heads only. You draw a coin at random from the urn, then flip it 5 times. What is the probability that you get a head on the 6th flip given you observed head on each of the first 5 flips?
Let $H_i$ be the event we observe head on the $i$th flip, and let $C_i$ be the event we draw the $i$th coin, $i = 1,…,10$.
Then we wish to calculate (using range syntax for brevity)
$$(
P(H_6 | H_1 H_2 H_3 H_4 H_5) = P(H_6 | H_{1:5})
$$)
Conditioning on which coin we drew, and exploiting the symmetry between coins 1 to 9:
Alternatively, you can use R to calculate the probability via brute force by repeatedly sampling according to our problem and counting the number of heads observed.
// function:boollearn_tree(constfloat**target,unsignedintnum_classes);// and I'm attempting to call with:float**residuals;// compiling produces:error:cannotinitializeaparameteroftype'constfloat**'withanlvalueoftype'float**'
what on earth? It turns out the winner is this beautiful bit of syntax: