olavk
|
cached 8 days ago
|
|
cached 7 days ago
|
|
cached 23 days ago
Or just one type of string: unicode character strings (which is sequences of unicode code points). Then a seperate type for byte arrays. Byte arrays are not character strings, but can easily be translated into a string (and back).
|
|
cached 23 days ago
Unicode breaks in the hello-world webapp. E.g. if you write
You get some strange looking text in you browsers. This seem to be because arc is generating UTF-8 output (which I think is MzScheme default) but not declaring the encoding, which will make most browsers default to interpret it as iso8859-1.
It seem to be fixed by changing svr.asc line 105 to
|
|
cached 23 days ago
...and this seem to be exactly what MzScheme provides :-)
Strings in MzScheme are sequences of unicode code points. "bytes" is a seperate type which is a sequence of bytes. There are functions to translate between the two, given an encoding.
Python 3000 is close to this, but I think Python muddles the issue by providing character-releated operations like "capitalize" and so on on byte arrays. This is bound to lead to confusion. (The reason seem to be that the byte array is really the old 8-bit string type renamed. Will it never go away?) MzScheme does not have that issue. |
|
cached 23 days ago
That just shows how agile PG is. He added unicode support the minute he saw people request it! :)
Seriously, PG explicitly claims that Arc intentionally doesn't support anything but ASCII (http://www.arclanguage.org/), so that might be why people (including me) believed that to be the case. |
|
cached 23 days ago
Thank you very much for the clarification! People got riled up because it sounded like you didn't want Arc to support unicode ever (or that the current support would be removed). As long as the language is in flux its not a problem.
However, fundamental Unicode support probably has to be in place before release 1.0. It will be painful to add at a later time if backwards compatibility is an issue. For example a lot of string processing code might assume that accessing characters by index is constant time. If the internal representation is changed to eg. UTF-8 this might lead to performance issues. On the other hand, if code assumes that strings are equivalent to byte-arrays, it might lead to trouble if they are changed to arrays of 32bit-values. I believe the simplest solution is to just have characters be 32bit integers. The internal representation of a string is just an array of 32bit characters. Sure this consumes more space, but who cares? As long as strings are a type seperate from byte-arrays, encoding/decoding issues and can be handled in libraries. |
|
cached 12 days ago
Another option is to have all necessary information about the current operation in the URL. This is highly scalable, since you don't need to keep track of anything user-specific on the server(s), and the navigation supports branching and back/undo just like you describe. Strangely enough PG specifically disallow this approach in his competition! Most web apps need _both_ global session state and URL-based state. As others have pointed out, if you browse a product catalog, you would like to be able to branch into different browser windows or use the back-button. However, when you add an item to the shopping basket, you want it to be a global state change (you want have the same shopping basket in all windows), and you don't want a buy to be undone by clicking back. Continuations are only an options for handling URL-based state, not for handling global state. And for page state they have some limitations. For example, if all navigation is handled by continuations, you basically have to store a continuation for every hit indefinitely, since you dont know if the user have bookmarked the URL. If you don't want to store the continuations forever, you should only use them on pages that are not bookmarkable anyway, i.e. pages that are the response to form posts. But then the stated advantages, like the ability to branch and use the back button is moot, since you cannot do that anyway with form responses. Continuations are really nifty for quick prototypes of web apps, but for production use, I believe they are a leaky abstraction. |
|
cached 11 days ago
I believe that e.g. accented characters like é are implemented as a single glyph in fonts, but are composed of two unicode code points: the base character (e) and a modifier character (´).
This is complicated by the issue that unicode also supports the combined character as a seperate single code point, for backwards compatibility with legacy character sets. However the decomposed (normalized) form is the recommended. |
|
cached 8 days ago
Writing flash apps in arc - that would be a _really_ cool showcase.
|
