|
|
| P is for Practical | |
| PerlMonks |
Tokenizing XMLby Skeeve (Parson) |
| on Dec 26, 2005 at 14:06 UTC ( #519142=perlquestion: print w/replies, xml ) | Need Help?? |
|
Skeeve has asked for the wisdom of the Perl Monks concerning the following question: Merry Christmas fellow monks! For a beanshell (yes! It's not Perl) macro I need a regular expression for tokenizing XML. I've read several nodes here about not to parse XML using regular expressions. But since I don't want to parse it just to tokenize all the parts of an XML file in a String, I thought it might be a good idea to ask for your assistance. The regular expression I have now (see below) is sufficiant for the XML in question. But if it's not too much overhead, I'd love to be able to tokenize any valid XML part with it. Or, to be specific, just tags, comments, CDATA, and prolog. I don't care for entities or any DTD. The expression I have now is: (split up for readability)
If this matches, one of these back references is not empty:
Many thanks in advance! s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{% +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
Back to
Seekers of Perl Wisdom
|
|
||||||||||||||||||||||||||||||||||||||