Tokenizing text with a StringTokenizer

First determine what characters delimit your text tokens, for example if you’re parsing a comma-separated values file (csv), your delimiter character would probably be the comma. Then create an instance of StringTokenizer with the String you want to split up into tokens and pass it your set of delimiter characters. You can also use the default set of delimiter characters which is ” tnrf”, the space character, the tab character, the newline character, the carriage-return character and the form-feed character.

Then continually call the method hasMoreTokens and nextToken until there are no more tokens. The StringTokenizer will keep track of the current position and nextToken will return the set of characters that appear before the next delimiter character. You can change the delimiter character(s) anytime by calling nextToken(String delim).

Here’s an example that reads in a csv and prints out its tokens.

csv.txt:

 "Mike",1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,9,8,7,6,5,4  
 "George",3,6,4,2,3,5,2,3,4  
 "Anne",7,4,3,2,8,5,4,1,,3,4,5  

Main.java:

import java.util.*;
import java.io.*;
 
public class Main
{
   public static void main(String []args) throws IOException {
      if (args.length != 1) {
         System.out.println("Usage: java Main <textfile>");
         System.exit(1);
      }
 
      String text = readFile(args[0]);
 
      String lineSep = System.getProperty("line.separator");
      StringTokenizer st = new StringTokenizer(text, " ," + lineSep);
      while (st.hasMoreTokens()) {
         String token = st.nextToken();
         if (token.charAt(0) == '"') {
            System.out.println();
         }
         System.out.print(token + "/");
      }
   }
 
   public static String readFile(String filename) throws IOException {
      BufferedReader br = new BufferedReader(new FileReader(filename));
      StringBuffer total = new StringBuffer();
      String lineSep = System.getProperty("line.separator");
      String line;
      while ((line = br.readLine()) != null) {
         total.append(line + lineSep);
      }
 
      br.close();
      return total.toString();
   }      
}

outputs:


"Mike"/1/2/3/4/5/6/7/8/9/1/2/3/4/5/6/9/8/7/6/5/4/
"George"/3/6/4/2/3/5/2/3/4/
"Anne"/7/4/3/2/8/5/4/1/3/4/5/