I saw there was another thread similar to this, but I didn't want to hijack it so here goes: In a program I am working on I need to split a string (which will be a line from a book) into words. I am programming in java, and String has a split method that takes a reg. exp. as an input which it then uses to split a string into an array of strings. My question is what would the regular expression look like? At the moment I have ", | " and that seems to work spaces or comma space things like "Hello, my name is tom", but something like "Hello. Are you there" would end up with "Hello." as a word in the array, when it should be "Hello". How do I make it so that the regex finds ". " or ", " or " " or ": " or "; ". I tried myself earlier but I just couldnt get it to work. I hope I explained it ok.... heres an example. "Hello, my: name; is. Bob" with the current expression yields an array filled with "Hello" "my:" "name;" "is." "Bob" what I am looking for would be: "Hello" "my" "name" "is" "Bob"
Input Code: "Hello, my: name; is. Bob" with the current expression yields an array filled with Solution Code: List<String> matchList = new ArrayList<String>(); try { Pattern regex = Pattern.compile("[\\w]+"); Matcher regexMatcher = regex.matcher(subjectString); while (regexMatcher.find()) { matchList.add(regexMatcher.group()); } } catch (PatternSyntaxException ex) { // Syntax error in the regular expression } Just iterate through the match list.
Thank you so much!! It works, but could you explain the expression? I assume \\w is any whitespace character?
\w is a short-hand character class to match a Word Character (i.e. letters, digits etc). To match whitespace, use a space litteraly or the character class \s Hope that explains it.
You might want to add in apostrophes and dashes (etc) otherwise stuff like "hello, my name isn't bob, it's dave!" will come out as 0. hello 1. my 2. name 3. isn 4. t 5. bob 6. it 7. s 8. dave