Bug

wrong encoding in "from" of emails (utf8, Russian)

Summary

open
Aug 9, 2007
Aug 9, 2007 / vrom
Jun 2, 2008 / stenyak
 

Attached files

No files uploaded
 
All email (notification and activation) comes with wrong (not readable) "from" if language is Russian (utf8).

Return-Path: ....
Received: ....
To: ....
Subject: Рбновлено РІ xxx.xxxx.ru
Content-Type: multipart/alternative; boundary="-streber--------------------------------------"
From: РЎРёССемное Сведомление <do-not-reply@xxx.xxxx.ru>
MIME-Version: 1.0
....

Issue report

Minor
Always
all
0.08
 

2 Comments

vrom:the subject of email goes wrong to

10 years ago


this problem was solved in CMS TYPO3 by:
$this->senderName = t3lib_div::encodeHeader($this->senderName,"base64","UTF-8");

	/**
	 * Implementation of quoted-printable encode.
	 * This functions is buggy. It seems that in the part where the lines are breaked every 76th character, that it fails if the break happens right in a quoted_printable encode character!
	 * See RFC 1521, section 5.1 Quoted-Printable Content-Transfer-Encoding
	 * Usage: 2
	 *
	 * @param	string		Content to encode
	 * @param	integer		Length of the lines, default is 76
	 * @return	string		The QP encoded string
	 */
	function quoted_printable($string,$maxlen=76)	{
			// Make sure the string contains only Unix linebreaks
		$string = str_replace(chr(13).chr(10), chr(10), $string);	// Replace Windows breaks (\r\n)
		$string = str_replace(chr(13), chr(10), $string);		// Replace Mac breaks (\r)

		$linebreak = chr(10);			// Default line break for Unix systems.
		if (TYPO3_OS=='WIN')	{
			$linebreak = chr(13).chr(10);	// Line break for Windows. This is needed because PHP on Windows systems send mails via SMTP instead of using sendmail, and thus the linebreak needs to be \r\n.
		}

		$newString = '';
		$theLines = explode(chr(10),$string);	// Split lines
		foreach ($theLines as $val)	{
			$newVal = '';
			$theValLen = strlen($val);
			$len = 0;
			for ($index=0; $index < $theValLen; $index++)	{	// Walk through each character of this line
				$char = substr($val,$index,1);
				$ordVal = ord($char);
				if ($len>($maxlen-4) || ($len>(($maxlen-10)-4)&&$ordVal==32))	{
					$newVal.='='.$linebreak;	// Add a line break
					$len=0;			// Reset the length counter
				}
				if (($ordVal>=33 && $ordVal<=60) || ($ordVal>=62 && $ordVal<=126) || $ordVal==9 || $ordVal==32)	{
					$newVal.=$char;		// This character is ok, add it to the message
					$len++;
				} else {
					$newVal.=sprintf('=%02X',$ordVal);	// Special character, needs to be encoded
					$len+=3;
				}
			}
			$newVal = preg_replace('/'.chr(32).'$/','=20',$newVal);		// Replaces a possible SPACE-character at the end of a line
			$newVal = preg_replace('/'.chr(9).'$/','=09',$newVal);		// Replaces a possible TAB-character at the end of a line
			$newString.=$newVal.$linebreak;
		}
		return preg_replace('/'.$linebreak.'$/','',$newString);		// Remove last newline
	}

	/**
	 * Encode header lines
	 * Email headers must be ASCII, therefore they will be encoded to quoted_printable (default) or base64.
	 *
	 * @param	string		Content to encode
	 * @param	string		Encoding type: "base64" or "quoted-printable". Default value is "quoted-printable".
	 * @param	string		Charset used for encoding
	 * @return	string		The encoded string
	 */
	function encodeHeader($line,$enc='quoted-printable',$charset='ISO-8859-1')	{
			// Avoid problems if "###" is found in $line (would conflict with the placeholder which is used below)
		if (strstr($line,'###'))	return $line;

			// Check if any non-ASCII characters are found - otherwise encoding is not needed
		if (!preg_match('/[^'.chr(32).'-'.chr(127).']/',$line))	return $line;

			// Wrap email addresses in a special marker
		$line = preg_replace('/([^ ]+@[^ ]+)/', '###$1###', $line);

		$matches = preg_split('/(.?###.+###.?|\(|\))/', $line, -1, PREG_SPLIT_NO_EMPTY);
		foreach ($matches as $part)	{
			$oldPart = $part;
			switch ((string)$enc)	{
				case 'base64':
					$part = '=?'.$charset.'?B?'.base64_encode($part).'?=';
				break;
				case 'quoted-printable':
				default:
					$qpValue = t3lib_div::quoted_printable($part,1000);
					if ($part!=$qpValue)	{
						$qpValue = str_replace(' ','_',$qpValue);	// Encoded words in the header should not contain non-encoded spaces. "_" is a shortcut for "=20". See RFC 2047 for details.
						$part = '=?'.$charset.'?Q?'.$qpValue.'?=';
					}
				break;
			}
			$line = str_replace($oldPart, $part, $line);
		}
		$line = preg_replace('/###(.+?)###/', '$1', $line);	// Remove the wrappers

		return $line;
	}

I hope to provide a patch for streber after some days


pixtur:thanks for the quick reply...

10 years ago

I would have had no idea how to deal with this problem.