Re: [PATCH 1/5] hhctrl.ocx: Add HTML to Unicode parsing capability.

10 Jun 2012


      On 6/8/12 11:18 PM, Erich E. Hoover wrote:
...
On Fri, Jun 8, 2012 at 8:17 AM, Jacek Cabanjacek@codeweavers.com  wrote:
...
...
I don't know any helper API for that. Writing decoder for HTML-encoded
characters sounds like a good solution.
How does something like the attached sound?
A few comments:
You definitely don't need a new header file for just one funcition 
declaration. Even the implementation probably doesn't need a separated 
file (it's <200 lines of code that is unlikely to grow).
+#include "hhctrl.h"
+#include <mshtml.h>
Probably left from the previous patch?
+        spc = strchr(amp, ' ');
+        if(spc && spc < sem)
+            break; /* cannot have a space between the ampersand and the 
semicolon */
This should not be needed (see above).
+        /* Convert the characters prior to the HTML encoded character */
+        wlen = MultiByteToWideChar(CP_ACP, 0, h, len, NULL, 0);
+        MultiByteToWideChar(CP_ACP, 0, h, len, w, wlen);
One call should be enough. You may just pass remaining space in the 
output buffer as its length.
+        if(amp[0] != '#')
+        {
+            
for(i=0;i<sizeof(html_encoded_symbols)/sizeof(html_encoded_symbols[0]);i++)
+            {
+                const char *encoded_symbol = 
html_encoded_symbols[i].html_code;
+
+                if(strncmp(encoded_symbol, amp, len) == 0)
+                {
+                    symbol = html_encoded_symbols[i].ascii_symbol;
+                    break;
+                }
+            }
+        }
Binary search sounds like a good choice here (although just FIXME 
comment would be fine for the patch).
+        {
+            int tmp;
+
+            sscanf(amp, "%d", &tmp);
+            symbol = tmp;
+        }
This will decode "&#123xxx;" as 123 instead of an invalid char. If you 
get it right, the earlier check for space won't be needed. strtol is 
probably better tool for this.
+            wlen = MultiByteToWideChar(CP_ACP, 0, &symbol, 1, NULL, 0);
+            MultiByteToWideChar(CP_ACP, 0, &symbol, 1, w, wlen);
Same here, two calls are not needed.
Cheers,
Jacek

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [PATCH 1/5] hhctrl.ocx: Add HTML to Unicode parsing capability.