Bugzilla – Full Text Bug Listing
|Summary:||VNC keyboard problems|
|Product:||ThinLinc||Reporter:||Pierre Ossman <firstname.lastname@example.org>|
|Component:||VNC||Assignee:||Peter Åstrand <email@example.com>|
|Status:||NEW||QA Contact:||Bugzilla mail exporter <firstname.lastname@example.org>|
|Bug Depends on:||27, 66, 342, 400, 705, 858, 965, 1431, 1862, 1868, 1919, 1973, 1982, 1983, 2130, 2139, 2447, 2493, 2653, 3074, 3114, 3156, 3167, 3223, 3365, 3417, 3511, 3522, 3704, 3761, 4068, 4207, 4524, 4526, 4548, 4560, 4602, 4654, 4670, 4677, 4808, 4865, 4971, 4984, 5135, 5205, 5226, 5228, 5229, 5230, 5258, 5269, 5272, 5550, 5931, 5932, 5933, 5934, 5935, 7175, 7237, 7259, 7276, 7281, 7282, 7326|
This is a tracker/documentation bug for all the fun and happiness the world of keyboards can bring.
First a primer on how keyboards work: Keyboards are essentially two input devices rolled into one: First and foremost they are a set of buttons that each can generate press and release events. On this level they are very similar to other input devices like mice and joysticks. Very few applications are interested in this model though and it is normally only used by games and similar programs. Secondly, keyboards are a way to input symbols. This is a very complex system involving state that changes depending on previous press and release events. E.g. symbol A and a are both generated by the same button but it depends on if the key presses been preceded by for example a Shift press or a Caps Lock press and release. This duality is the root of most (all?) problems related to keyboards.
Next is how the keyboards are seen from applications. At the core of X11 is keycodes. These map to physical buttons in an undefined manner that is controlled by whatever driver interfaces with the physical keyboard. Since they are undefined, applications should make no assumptions on which key they represent. This level corresponds to the first input model of keyboards. On top of this is X11 keysyms. These represent the symbols and they are more heavily standardised. This is what all well behaved applications should look at. Translation from keycodes to keysyms is done through a a mapping table and is performed in every application (in libX11 though, so it's fairly transparent). The mapping is stored in the X11 server though and the applications are notified whenever it changes. The primary mapping is from a keycode to a set of keysyms. Which keysym to use from this set depends on the current keyboard "state", which is also provided by the server. As an example the "4" key on the keyboard maps to the symbols "4", "¤" and "$" depending on the current state. Keyboard state is administered by the X11 server and has its own mapping table to determine how it is composed. Each bit in the state is called a modifier and is toggled by one or more keycodes. Multiple keycodes are required so that e.g. left and right shift both have the same effect.
There are two special state handling mechanisms that deviate from the above description: 1. dead keys This is handled by every application (again, transparently by libX11). libX11 has a list of "compositions" which it uses for two things: - It tells the application which events to ignore (using XFilterEvent()). - It inserts a fake press event with the composed character, using keycode 0. E.g. pressing ~ (dead tilde) and n will generate: 1. press of dead tilde (ignored) 2. release of dead tilde 3. press of n (ignored) 4. press of ñ (faked) 5. release of n 2. Lock keys (num lock, caps lock, etc.) First of all, libX11 treats these a bit special in that it not only looks at the modifier, but also which keysym (note keysym, not keycode!) is bound to that modifier. The ones that have special meaning are: - Caps_Lock, ISO_Lock or Shift_Lock bound to modifier Lock - Mode_Switch or Num_Lock bound to any of mod1 through mod5. For all the *_Lock keysyms there is another peculiarity in that they are toggled with every key press, not turned on by press and off by release like most modifiers. This is special voodoo that is hard coded for those specific keysyms and is done in dix/getevents.c.
The new kid on the block is XKB, the X Keyboard Extension. Currently we have no support for this in our VNC, but its been the standard system on desktops for ages so we need to get there eventually. XKB basically replaces everything excepts the keycodes, which is still how keyboard events are sent out by the X11 server. XKB uses concepts similar to the old keysym mapping and modifier system but extends them and brings some new additions: - The system for translating a keycode to a keysym with regard to state is now more complex with the ability to look at more than shift state and mode switch. XKB uses a concept called groups for the different alternative keysyms for each keycode. - Applications that want to use the first keyboard model (buttons, not symbols) now have a hardware independet way to do this as XKB provides a mapping from keycodes to abstract "key names". - XKB provides a lot of extra metadata like the physical layout of keyboards. - XKB also adds some more tweakability of the X11 keyboard handling, like providing applications the ability to individually turn on and off auto repeat. A full description of XKB and how it affects us will have to wait until someone here fully undestands it. Until then the wikipedia article gives a decent overview: http://en.wikipedia.org/wiki/Xkb
Now for the common problems with keyboards. 1. Looking at keycodes. As mentioned, keycodes are not standardised, are hardware dependent and generally shouldn't be used directly. Unfortunately some applications do as they have a need for the first keyboard input model. This can be games or applications that need to interface with some other system (like wine, rdesktop or vmware) and physical key presses is the only interface available. The reason these programs have done this even though in theory it shouldn't work is because of the monoculture on PCs. The only X11 server available was Xfree86 and the only keyboard was the standard PC AT keyboard. This meant that keycodes were stable on that platform and people started assuming that they would always be stable. On VNC this causes problems as VNC normally doesn't have any reliable mapping for keycodes and dynamically set up mappings for keysyms. The workaround has been to boot strap VNC with the PC AT keyboard mapping. Unfortunately there is one mapping per keyboard layout. Although they share a lot of keys, they aren't identical. So to implement this workaround fully, we'd have to load every existing PC AT keyboard mapping into VNC and let the user pick the one that matches the local keyboard on the client. A few years back, this started breaking on normal desktops as well as Xorg started moving away from the PC AT keyboard interface and on to the Linux input abstraction. Unfortunately this new monoculture is even stronger than the last one as the whole point of the Linux input system is to give applications a consistent key mapping no matter the hardware. It is therefore likely that the silliness of looking at keycodes will continue. The proper solution for programs wanting the first input model is of course to switch to XKB where there is a standardised hardware independent abstraction for physical keys. It unfortunately means we need to redesign VNC to deal with XKB properly first though. 2. Remote desktops Because of the way the keyboard interface is designed for applications, it is more or less impossible to create remote desktop systems and get them 100% correct. The core issue is that you can either transfer the first input model, or the second one, but not both. Example A, transferring the first model (RDP): Pressing the button right of L registers as the same button pressed both on the local and the remote system. However the local system displays "ö" and the remote system displays ";". This is because they do not share the same view of how to map physical keys to symbols. Example B, transferring the second model (VNC): Pressing ";" shows up as the same symbol both locally and remotely. However locally the application will see the ";" button pressed and released. The remote applications however will see a faked Shift press, a faked "," press, a faked "," release and a faked Shift release. Again, this is caused by mappings that differ. If you assume that both systems are identical in infrastructure (e.g. both X11 with XKB), then it could be possible to synchronise the systems and get consistent behaviour. But VNC is supposed to be cross platform, and if you look at something like rdesktop which has to map the X11 keyboard model to the Windows one, then it's soon obvious that solving this problem for the general case is impossible.
The current state of wine seems to be a lot better then how it was initially described to me. They do look at keycodes, but they do not make any direct assumptions on these matching any standard driver (like Xfree86 or evdev). The system works as follows: Wine has two different systems for dealing with keys. The "main area" of the keyboard, containing 1-0, a-z and the smaller keys around them, are dealt with using a mapping from keycode to windows scan codes and vkeys. Every other key is more properly dealt with and a pure keysym basis. The mapping system is set up by using several built in mappings and trying to figure out which one most closely matches the keyboard presented in X11. For each keycode, wine checks the basic keysym and the shifted one and tries to find that pair in the mapping currently being evaluated. Every hit means a point for that mapping. The mapping with the most points is selected as the most likely layout to work. Given that wine tries to achieve perfect translation of both keyboard input models at the same time (see section 2 in comment 5), this method is probably as good as it gets. It does rely on two assumptions though: 1. The keyboard layout will be more or less identical to one of wine's built in layouts. The keycodes can be shuffled around, but for every key in wine's layout there must be a key in X11 that does the same thing. 2. The keyboard layout is fully setup when wine starts and doesn't change. It's probably assumption 2 that is causing all the problems for VNC. Since the mappings are allocated as needed, most of the keyboard will be missing when wine starts. This means that it has very low odds of picking a suitable mapping. Right now we solve this by having a swedish mapping by default. We could also solve this more generically if we could transfer the relevant portion of the keyboard off the client somehow.
Bug 400 is about making sure that the Num Lock state on the VNC server is the same as on the client. Since things like libX11 and wine do special processing based on this state, it is something that would avoid a lot of bugs. Bug 1919 and bug 2653 is somewhat related to the issue in that they detail the issue of Num Lock having different semantics on X11 and Windows.
Bug 1973 is a natural consequence of the workaround mentioned in comment #5. The Swedish keymap used to bootstrap VNC will only solve the bugs for Swedish users.
Bug 1983 is another issue with wine's keyboard handling, this time with how it deals with "non-dead" keys. This bug might no longer be relevant given the new behaviour of wine.
Bug 2493 is about fixing the VNC client CotVNC, which basically tried to do the first keyboard model even though VNC is designed for the second. Worked fine for CotVNC client to CotVNC server, but not very well with other (proper) VNC implementations.
Bug 3511 details a new variation on an old theme, monoculture making applications assume certain things that aren't always true. In this case it is about modifiers. As mentioned in comment 3, some modifiers are a bit special so the first two (or possibly three) modifiers are if not mandated, at least heavily implied to look a certain way. mod1 through mod5 has historically varied a bit though, but the current state is that these should be true: - Alt and Meta should be present and be the same modifier. - Num_Lock shall be present. - Mode_Switch shall be present. - The Windows key shall be present. The last one is a sneaky one because the concensus for which keysym the Windows key should have has varied. Currently it is Super, but Hyper has been common in the past. The current monoculture however is Xorg on a PC, and for similar reasons as for the workaround in comment 5, we need to bootstrap VNC with the expected "standard" to avoid triggering bugs in applications.
Bug 3522 touches on the subject mentioned in comment 5. Our current workaround is based on the old Xfree86 PC AT keyboard driver, but the current de facto standard is the Linux input system. IOW our bootstrapping keyboard mapping is no longer following that applications are normally exposed to and therefore might no longer have the desired effect.
As part of bug 3074, it's time to get up close and personal with XKB. The rationale behind XKB covers a lot of things, but the important changes relevant for us are: - Nothing is implict anymore (not quite true, but close enough) - Symbol generation is now a lot more complex than the Shift/AltGr four column model Some things are still familiar though: - Clients still receive just the keycode and the state, and do the translation to a symbol themselves. - The list of modifiers is the same (although they no longer have implicit actions) == Key Names == In the core protocol, keycodes are raw values from the hardware and should never _ever_ be interpreted directly by applications (which is conveniently ignored left and right, see comment 5). There are well known names like <ESC>, or <AE01> that are then mapped to keycodes. All other XKB configuration references the names, not the codes, making those portions hardware independent. == Actions == As mentioned, XKB tries to do away with implicit magic. One big part of this is the introduction of actions. In the core protocol you would bind a keycode to the modifier Shift, and that would implicitly make that keycode toggle the modifier state. In XKB however, you would explicitly have to bind the _symbol_ Shift_L to the action SetMods(Shift). If you want a locking behaviour (like caps lock or num lock), you instead bind it to the action LockMods(). That means that every modifier can have locking behaviour, and which key does this is completely configurable. Actions are bound to keys the same way as symbols are. So whenever the rest of the text refers to symbols, it generally also means actions. Actions are also used for things like "mouse keys", which won't be covered here. == Key Types == XKB introduces the concept of "key types". It defines the number of symbols (or "levels") that each key can generate, and how the current set of modifiers chooses between these symbols. Each key picks its own type, so you cannot compare the symbol list of two keys unless they have exactly the same type. The system allows full control over modifier combinations. E.g. if Shift "cancels" Lock depends on how your type is defined. In the core protocol, every key would be of the same "type" and have four levels and respond to Shift and Lock (numpad keys are an exception though). == Key Map == The XKB key map serves the same purpose as it does for the core protocol; it maps between keycodes (via key names) and symbols. The difference is that now the type determines how to choose the column. == Groups == XKB also has something called "groups". In principle they behave like modifiers, but there are only four groups in total and only one group is active at a given time. For each key, the type and key map is specified independently for each group. In practice, groups are used to have multiple layouts loaded at once. So you can have Swedish and English configured at the same time, and the resulting symbols is changed by simply changing the active group. This makes things much more sane compared to having to reload the entire keyboard configuration.
So XKB allows for a lot more flexibility in how symbols are generated, and it makes life easier for applications that prefer the first input model (keys rather than symbols). Unfortunately it makes life very difficult for systems that try to run the system "in reverse" and figure out which set of keycodes will result in one specific symbol (e.g. VNC). A proper such implementation would be massively complex and is probably unfeasible. Some kind of heuristic will have to be applied instead.
One more thing about XKB: == Compatibility Maps == These are an alternative way of specifying actions, and they seem to be the preferred method. Instead of binding an action to a key, you can bind it to a symbol. That way the action will automatically be moved around as you change your key map. Generally this is what you want as you want the Shift modifier to be where the Shift_L/Shift_R symbols are, rather than fixed to specific keys.
Also some minor notes on how XKB handles compatibility with older Core protocol clients. First of all, when an application uses the old Core API, it will still be using the XKB protocol and mechanisms. This because libX11 is magically mapping things to XKB even if you are calling the older functions. The only way to truly use the Core protocol is to use a libX11 that was compiled without any XKB support at all. You'll have to find a really old system to encounter anything like that. The X server does however try to be compatible with the Core protocol, should such an old libX11 appear. But unfortunately it fails miserably and rarely works in practice. The biggest and most critical bug is in how it generates the Core symbol table. Since XKB has a much more flexible symbol lookup system, it cannot be represented with the simple four column table of the Core protocol. It could generate a table for the symbols when things happen to match (which is fairly common for most western layouts), but again it fails at even this simple task. The XKB specification clearly describes how the Core table should be generated from the XKB layout. But the described algorithm ignores the flexibility of the XKB system, and makes the assumption that all XKB layout definitions follow [at least] these two rules: a) Column 1 and 2 are selected based on the exact same Shift and Lock rules as specified by the Core protocol, no matter the key type. b) Column 3 and 4 are what you get when you switch to the second XKB group, and that group switch is triggered by Mode_Switch. The first assumption holds somewhat well for most layouts. Lock behaviour is generally different in XKB layouts compared to the Core protocol, but otherwise it maps rather well. The second assumption is however horribly wrong in so many cases. The likely reason for this silly assumption is that in the Core protocol, column 1/2 and 3/4 are called groups. They are however very different from the four XKB groups, even if someone at some time during the drafting of XKB had some idea about similar use. (There are even more weird aspects of this compatibility algorithm, but these are enough to screw things up) The end result of this is that you get a Core symbol map that doesn't behave anywhere near how the XKB layout describes things. Most people don't notice as your libX11 will probably be using the XKB stuff directly. But it's important to know that you cannot trust what tools like "xmodmap" give you.