New ask Hacker News story: Web Pages via Email – Syntax?

Web Pages via Email – Syntax?
3 by graycat | 2 comments on Hacker News.
A problem, a solution that is at least partially complete, some documentation of the solution, and a question -- is there more to the problem and more needed for a complete solution? The situation and problem: Received some email that contained a Web page. I.e., used Web browser Firefox to go to the Web site of my email provider, got their Web page that contained some email sent to me, in that Web page read the email, and in that email read a Web page (e.g., from Goldman-Sachs, but similarly from Macy's, a university, a symphony orchestra, ...). So the email (SMTP, simple mail transfer protocol) made use of MIME (multi-media internet mail extensions), and the HTML from Goldman-Sachs was in a MIME Part. The results looked fine -- Firefox, my email provider, and Goldman-Sachs all did fine. But I would like to have the Web page I received, the one from Goldman-Sachs, in a file, say, A.HTM, and that I could give to Firefox and again get the Goldman-Sachs Web page. So, just using a text editor, copied the HTML from the MIME Part to a file A.HTM. Gave file A.HTM to Firefox and got only a mess: Some of the text was visible, but the formatting was a disaster. Congrats to Firefox for displaying the stuff at all! So, Internet Secret 101 A: The HTML data in the MIME part had some syntax changes, and to get a file B.HTM that will display the Web page I was sent (e.g., by Goldman-Sachs) that will display like it's supposed to and did, need to undo the syntax changes! Why try to get the file B.HTM? Maybe such a file will prove to be a welcome part of some future email handling. Maybe. A guess would be that just using the capabilities of MIME Parts would permit no syntax changes, but apparently, instead, the changes are popular. So, what are the changes? Didn't see any documentation so just looked at some examples, guessed, and saw two: (1) The syntax has the lines of HTML bytes broken (split) at <= 72 characters long and then an equal character appended to indicate that should remove the equal and append the next line to undo the split. As I recall, this syntax is part of SMTP. (2) The really popular characters 0-9, a-z, and A-Z represent themselves, but many other characters are encoded , e.g., a period character can be replaced with =2D The 2 and the D are each hexadecimal for 4 bits where the 2 represents bits 0010 and the D represents bits 1101. So, the 2D is 8 bit byte 0010 1101 which is the period character. There are more details and characters at https://www.w3schools.com/charsets/ref_utf_basic_latin.asp So, to have file B.HTM, have to undo syntax changes (1) and (2) in file A.HTM. So, I wrote some code to execute the undo . On some examples, the one from Goldman-Sachs and some others, the code seems to work, got a file B.HTM that Firefox does display apparently fine, and changes (1) and (2) seem to be enough. Okay, Question? I didn't see (1) and (2) documented so had to guess. So what else is there to the syntax change beyond (1) and (2)?

Comments