Atom Feed SITE FEED   ADD TO GOOGLE READER

Rip.java: stream manipulation for Java programmers

I never learned sed or awk. Or even Perl. But I'm pretty good with Java's regex, and I'm familiar with the new text formatting facilities in Java 5.

So rather than tricking myself into learning sed and awk, I wrote my own stream processor that uses Java's regex and pattern syntax:
jessewilson$ Rip.java
Usage: Rip [flags] <regex> <format>

regex: a Java regular expression, with groups
http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html
you can (parenthesize) groups
\s whitespace
\S non-whitespace
\w word characters
\W non-word

format: a Java Formatter string
http://java.sun.com/javase/6/docs/api/java/util/Formatter.html
%[argument_index$][flags][width][.precision]conversion
'%s', '%1$s' - the full matched text
'%2$s' the first (parenthesized) group

Use 'single quotes' to prevent bash from interfering

flags:
--skip_unmatched: ignore input that doesn't match <regex>
-s:

--newline <text>: use <text> to separate lines in output
-n <text>:
So it takes Java regexes in, finds matching groups in parenthesis, and then spits those back out using String.format. Here's some examples:

jessewilson$ echo "7278 ttys001 0:00.66 ssh jessewilson.publicobject.com" |
Rip.java 'ssh.*' '%s'
ssh jessewilson.publicobject.com

jessewilson$ echo "http://publicobject.com/glazedlists/ Glazed Lists Homepage" |
Rip.java 'http://([\w.]+)\S*\s+(.*)' '%3$s: %2$s'
Glazed Lists Homepage: publicobject.com
These examples are certainly the tip-of-the-iceberg. I suspect I'll be using this tool to munge output from many processes into the input for many other processes.

Try Rip Out


Download Rip.java, make it executable (chmod a+x Rip.java) and put it somewhere on your path. In what is almost certainly more clever than useful, I hacked it up so the uncompiled source can be executed directly by Bash:
/*bin/mkdir /tmp/rip 2> /dev/null
javac -d /tmp/rip $0
java -cp /tmp/rip Rip "$@"
exit
*/
import java.io.*;
import java.util.*;
import java.util.regex.*;

public class Rip {
...
}

Replace my clever hack with a .class and wrapper script if you'd prefer.

Coding in the small with Google Collections: AbstractIterator

Part 17 in a Series.

I really like the Java Collections API. So much so, that I use 'em when I'm doing work that isn't particularly collectioney. For example, I recently wrote a quick-n-dirty app that rewrote some files line-by-line. Instead of using a Reader as input, I used an Iterator<String>. The easiest way to create such an iterator is to load the entire file into memory first.

Before:

  public Iterator<String> linesIterator(Reader reader) {
BufferedReader buffered = new BufferedReader(reader);
List<String> lines = new ArrayList<String>();

try {
for (String line; (line = buffered.readLine()) != null; ) {
lines.add(line);
}
} catch (IOException e) {
throw new RuntimeException(e);
}

return lines.iterator();
}
That code is simple, but inefficient. And it won't work if the file doesn't fit into memory. A better approach is to implement Iterator and to read through the file on-demand as the lines are requested. Google Collections ' AbstractIterator makes this easy. Whenever a new line is requested, it gets called back to read it from the stream.

After:

  public Iterator<String> linesIterator(Reader reader) {
final BufferedReader buffered = new BufferedReader(reader);

return new AbstractIterator<String>() {
protected String computeNext() {
try {
String line = buffered.readLine();
return line != null ? line : endOfData();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
};
}
This class is really takes the fuss out of custom iterators. Now it's not difficult to create iterators that compute a series, process a data stream, or even compose other iterators.

Coding in the small with Google Collections: Sets.union, intersection and difference

Part 16 in a Series.

The traditional approach to unions is to first create a new Set, and then to addAll using each component set. You can use a similar approach to do differences and intersections.

Before:

  private static final ImmutableSet<String> LEGAL_PARAMETERS;
static {
Set<String> tmp = new HashSet<String>();
tmp.addAll(REQUIRED_PARAMETERS);
tmp.addAll(OPTIONAL_PARAMETERS);
LEGAL_PARAMETERS = ImmutableSet.copyOf(tmp);
}

public void login(Map<String, String> params) {
if (!LEGAL_PARAMETERS.containsAll(params.keySet())) {
Set<String> unrecognized = new HashSet<String>(params.keySet());
unrecognized.removeAll(LEGAL_PARAMETERS);
throw new IllegalArgumentException("Unrecognized parameters: "
+ unrecognized);
}

if (!params.keySet().containsAll(REQUIRED_PARAMETERS)) {
Set<String> missing = new HashSet<String>(REQUIRED_PARAMETERS);
missing.removeAll(params.keySet());
throw new IllegalArgumentException("Missing parameters: " + missing);
}

...
}
Google Collections has methods that do set arithmetic in a single line.

After:

  private static final ImmutableSet<String> LEGAL_PARAMETERS
= Sets.union(REQUIRED_PARAMETERS, OPTIONAL_PARAMETERS).immutableCopy();

public void login(Map<String, String> requestParameters) {
if (!LEGAL_PARAMETERS.containsAll(requestParameters.keySet())) {
throw new IllegalArgumentException("Unrecognized parameters: "
+ Sets.difference(requestParameters.keySet(), LEGAL_PARAMETERS));
}

if (!requestParameters.keySet().containsAll(REQUIRED_PARAMETERS)) {
throw new IllegalArgumentException("Missing parameters: "
+ Sets.difference(REQUIRED_PARAMETERS, requestParameters.keySet()));
}

...
}
Unlike the traditional approach, these methods don't do any copies! Instead, they return views that delegate to the provided sets. In the occasional case when the copy is worthwhile, there's a handy method immutableCopy to give you one.