Development Regular Expressions

Discussion in 'Software' started by theevilelephant, 19 Oct 2008.

  1. theevilelephant

    theevilelephant Minimodder

    Joined:
    5 Jan 2006
    Posts:
    1,334
    Likes Received:
    36
    I saw there was another thread similar to this, but I didn't want to hijack it so here goes:

    In a program I am working on I need to split a string (which will be a line from a book) into words. I am programming in java, and String has a split method that takes a reg. exp. as an input which it then uses to split a string into an array of strings. My question is what would the regular expression look like?

    At the moment I have ", | " and that seems to work spaces or comma space things like "Hello, my name is tom", but something like "Hello. Are you there" would end up with "Hello." as a word in the array, when it should be "Hello". How do I make it so that the regex finds ". " or ", " or " " or ": " or "; ". I tried myself earlier but I just couldnt get it to work.

    I hope I explained it ok.... heres an example.

    "Hello, my: name; is. Bob" with the current expression yields an array filled with
    "Hello"
    "my:"
    "name;"
    "is."
    "Bob"

    what I am looking for would be:
    "Hello"
    "my"
    "name"
    "is"
    "Bob"
     
  2. RTT

    RTT #parp

    Joined:
    12 Mar 2001
    Posts:
    14,120
    Likes Received:
    74
    /[\w-\']+/
     
  3. woodshop

    woodshop UnSeenly

    Joined:
    14 Oct 2003
    Posts:
    1,408
    Likes Received:
    8
    except hes in java which i don't think has shortcuts.. or //
    so probably more like "[.,:;!$?]+"
     
  4. koola

    koola Minimodder

    Joined:
    11 Jul 2004
    Posts:
    2,401
    Likes Received:
    10
    Input
    Code:
    "Hello, my: name; is. Bob" with the current expression yields an array filled with 
    Solution
    Code:
    List<String> matchList = new ArrayList<String>();
    try {
    	Pattern regex = Pattern.compile("[\\w]+");
    	Matcher regexMatcher = regex.matcher(subjectString);
    	while (regexMatcher.find()) {
    		matchList.add(regexMatcher.group());
    	} 
    } catch (PatternSyntaxException ex) {
    	// Syntax error in the regular expression
    }
    
    Just iterate through the match list.
     
  5. theevilelephant

    theevilelephant Minimodder

    Joined:
    5 Jan 2006
    Posts:
    1,334
    Likes Received:
    36
    Thank you so much!! It works, but could you explain the expression? I assume \\w is any whitespace character?
     
  6. koola

    koola Minimodder

    Joined:
    11 Jul 2004
    Posts:
    2,401
    Likes Received:
    10
    \w is a short-hand character class to match a Word Character (i.e. letters, digits etc). To match whitespace, use a space litteraly or the character class \s

    Hope that explains it.
     
  7. theevilelephant

    theevilelephant Minimodder

    Joined:
    5 Jan 2006
    Posts:
    1,334
    Likes Received:
    36
    Thanks! Much appreciated :)
     
  8. RTT

    RTT #parp

    Joined:
    12 Mar 2001
    Posts:
    14,120
    Likes Received:
    74
    You might want to add in apostrophes and dashes (etc) otherwise stuff like "hello, my name isn't bob, it's dave!"

    will come out as

    0. hello
    1. my
    2. name
    3. isn
    4. t

    5. bob
    6. it
    7. s

    8. dave

    :)
     
  9. koola

    koola Minimodder

    Joined:
    11 Jul 2004
    Posts:
    2,401
    Likes Received:
    10
    Agreed :duh:
     
  10. theevilelephant

    theevilelephant Minimodder

    Joined:
    5 Jan 2006
    Posts:
    1,334
    Likes Received:
    36
    yes! i've only just noticed that lol!
    Would it be
    "[\\w'-]+"
     
  11. RTT

    RTT #parp

    Joined:
    12 Mar 2001
    Posts:
    14,120
    Likes Received:
    74
    Yes as per my 1st post :)
     

Share This Page