java8 has a bunch of nice improvements, and over the holidays I’ve had time to play with them a bit.
First, say goodbye to requiring Apache Commons for really simple functionality, like joining a string!
12345678910111213141516171819202122
importstaticjava.util.stream.Collectors.joining;importjava.util.Arrays;importjava.util.List;/** * */publicclassStringUtils{publicstaticvoidmain(String[]args){List<String>words=Arrays.asList("a","b","a","a","b","c","a1","a1","a1");// old style print each element of a list: Arrays.toString(result.toArray())puts("java6 style %s",Arrays.toString(words.toArray()));puts("java8 style [%s]",words.stream().collect(joining(", ")));}publicstaticvoidputs(Strings){System.out.println(s);}publicstaticvoidputs(Stringformat,Object...args){puts(String.format(format,args));}}
java8 also massively cleans up some common operations. A common interview question is given an array or list of words, print them in descending order by count, or return the top n sorted by count descending. A standard program to do this may go like this: create a map from string to count; reverse the map to go from count to array of words with that count, then descend to the correct depth.
importjava.util.ArrayList;importjava.util.Arrays;importjava.util.Collections;importjava.util.HashMap;importjava.util.List;importjava.util.Map;importjava.util.TreeMap;/** * get the n highest frequency words */publicclassWordCounts{publicstaticvoidmain(String[]args){String[]words=newString[]{"a","b","a","a","b","c","a1","a1","a1"};for(intdepth=0;depth<4;depth++){List<String>result=getMostFrequentWords(words,depth);puts("depth %d -> %s",depth,Arrays.toString(result.toArray()));puts("");}}publicstaticList<String>getMostFrequentWords(String[]words,intdepth){if(words==null||words.length==0||depth<=0)returnCollections.emptyList();// word -> countsHashMap<String,Integer>counts=newHashMap<>();for(Stringword:words){if(counts.containsKey(word))counts.put(word,counts.get(word)+1);elsecounts.put(word,1);}// count -> list of words with that countTreeMap<Integer,ArrayList<String>>countmap=newTreeMap<>();for(Map.Entry<String,Integer>entry:counts.entrySet()){if(countmap.containsKey(entry.getValue()))countmap.get(entry.getValue()).add(entry.getKey());else{ArrayList<String>l=newArrayList<>();l.add(entry.getKey());countmap.put(entry.getValue(),l);}}// iterate through treemap to desired depthArrayList<String>result=newArrayList<>();while(result.size()<=depth){for(Integeri:countmap.descendingKeySet()){ArrayList<String>list=countmap.get(i);if(list.size()+result.size()<depth){result.addAll(list);}else{for(Strings:list){result.add(s);if(result.size()==depth)returnresult;}}}}returnresult;}publicstaticvoidputs(Strings){System.out.println(s);}publicstaticvoidputs(Stringformat,Object...args){puts(String.format(format,args));}}
Using java8 streams, we can clean up much of this. For starters, creating the map from word –> word count is essentially build in.
123
// word -> countsMap<String,Long>counts=Arrays.stream(words).collect(Collectors.groupingBy(s->s,Collectors.counting()));
Java8 also directly supports inverting or reversing a map, replacing the need to either do it by hand or use guava’s bi-directional map. In the common case, where values are unique, this will suffice:
1234
// count -> list of words: reverse the counts mapMap<Long,String>countmap=counts.entrySet().stream().collect(Collectors.toMap(Map.Entry::getValue,Map.Entry::getKey));puts("countmap: %s",countmap);
Unfortunately, in my case that throws an exception because there is more than one word with the same count. So it’s slightly more complicated:
123
// count -> list of words: reverse a map with duplicate values, collecting duplicates in an ArrayListMap<Long,ArrayList<String>>countmap=counts.entrySet().stream().collect(Collectors.groupingBy(Map.Entry<String,Long>::getValue,Collectors.mapping(Map.Entry<String,Long>::getKey,Collectors.toCollection(ArrayList::new))));
But I really want a treemap, so I can iterate over they keys in order. Fortunately, I can specify which type of map I want