Ich habe sehr komplizierte Regex, aber es funktioniert.
\S*\d+\S*|\p{Punct}{2,}\S*|\S*\p{Punct}{2,}|[\p{Punct}&&[^-]]+|(?<![a-z])\-(?![a-z])
Erläuterung:
Match this alternative «\S*\d+\S*»
Match a single character that is NOT a “whitespace character” «\S*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match a single character that is a “digit” «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match a single character that is NOT a “whitespace character” «\S*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Or match this alternative «\p{Punct}{2,}\S*»
Match a character from the POSIX character class “punct” «\p{Punct}{2,}»
Between 2 and unlimited times, as many times as possible, giving back as needed (greedy) «{2,}»
Match a single character that is NOT a “whitespace character” «\S*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Or match this alternative «\S*\p{Punct}{2,}»
Match a single character that is NOT a “whitespace character” «\S*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match a character from the POSIX character class “punct” «\p{Punct}{2,}»
Between 2 and unlimited times, as many times as possible, giving back as needed (greedy) «{2,}»
Or match this alternative «[\p{Punct}&&[^-]]+»
Match a single character present in the list below «[\p{Punct}&&[^-]]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
A character from the POSIX character class “punct” «\p{Punct}»
Except the literal character “-” «&&[^-]»
Or match this alternative «(?<![a-z])\-(?![a-z])»
Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<![a-z])»
Match a single character in the range between “a” and “z” «[a-z]»
Match the character “-” literally «\-»
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?![a-z])»
Match a single character in the range between “a” and “z” «[a-z]»
Beispiel:
String text ="a-b ab--- - ---a --- , ++++ ?%# $22 43 4zzv";
String rx = "(?i)\\S*\\d+\\S*|\\p{Punct}{2,}\\S*|\\S*\\p{Punct}{2,}|[\\p{Punct}&&[^-]]+|(?<![a-z])\\-(?![a-z])";
String result = text.replaceAll(rx, " ").trim();
System.out.println(result);
-Code oben druckt:
a-b
ich versuche ich zu nutzen t, und ich arbeite teilweise, aber ich werde auch Bindestrich (s) loswerden, die in einem Wort sind. – zzz