Sie können eine willkürliche Sequenz von Bytes mit dem UTF8-Flag setzen, das immer noch gesetzt wird, indem Sie die Eingeweide eines Strings hacken.
use Inline C;
use Devel::Peek;
utf8::upgrade($str = "");
Dump($str);
twiddle($str, "\x{BD}\x{BE}\x{BF}\x{C0}\x{C1}\x{C2}");
Dump($str);
__DATA__
__C__
/** append arbitrary bytes to a Perl scalar **/
void twiddle(SV *s, const char *t)
{
sv_catpv(s, t);
}
typische Ausgabe:
SV = PV(0x80029bb0) at 0x80072008
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
PV = 0x80155098 ""\0 [UTF8 ""]
CUR = 0
LEN = 12
SV = PV(0x80029bb0) at 0x80072008
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
PV = 0x80155098 "\275\276\277\300\301\302"\0Malformed UTF-8 character (unexpected continuation byte 0xbd, with no preceding start byte) in subroutine entry at ./invalidUTF.pl line 6.
Malformed UTF-8 character (unexpected continuation byte 0xbe, with no preceding start byte) in subroutine entry at ./invalidUTF.pl line 6.
Malformed UTF-8 character (unexpected continuation byte 0xbf, with no preceding start byte) in subroutine entry at ./invalidUTF.pl line 6.
Malformed UTF-8 character (unexpected non-continuation byte 0xc1, immediately after start byte 0xc0) in subroutine entry at ./invalidUTF.pl line 6.
Malformed UTF-8 character (unexpected non-continuation byte 0x00, immediately after start byte 0xc2) in subroutine entry at ./invalidUTF.pl line 6.
[UTF8 "\x{0}\x{0}\x{0}\x{0}\x{0}"]
CUR = 6
LEN = 12
Unicode und Perl wie Bonnie und Clyde - hat dir deine Zeit gestohlen und dir einen fantastischen Abend und eine tolle Nacht beschert :) – gaussblurinc