Bugzilla – Bug 3523
VNC keyboard problems
Last modified: 2019-09-03 10:40:40
You need to
before you can comment on or make changes to this bug.
This is a tracker/documentation bug for all the fun and happiness the world of
keyboards can bring.
First a primer on how keyboards work:
Keyboards are essentially two input devices rolled into one:
First and foremost they are a set of buttons that each can generate press and
release events. On this level they are very similar to other input devices like
mice and joysticks. Very few applications are interested in this model though
and it is normally only used by games and similar programs.
Secondly, keyboards are a way to input symbols. This is a very complex system
involving state that changes depending on previous press and release events.
E.g. symbol A and a are both generated by the same button but it depends on if
the key presses been preceded by for example a Shift press or a Caps Lock press
This duality is the root of most (all?) problems related to keyboards.
Next is how the keyboards are seen from applications.
At the core of X11 is keycodes. These map to physical buttons in an undefined
manner that is controlled by whatever driver interfaces with the physical
keyboard. Since they are undefined, applications should make no assumptions on
which key they represent. This level corresponds to the first input model of
On top of this is X11 keysyms. These represent the symbols and they are more
heavily standardised. This is what all well behaved applications should look
Translation from keycodes to keysyms is done through a a mapping table and is
performed in every application (in libX11 though, so it's fairly transparent).
The mapping is stored in the X11 server though and the applications are
notified whenever it changes.
The primary mapping is from a keycode to a set of keysyms. Which keysym to use
from this set depends on the current keyboard "state", which is also provided
by the server. As an example the "4" key on the keyboard maps to the symbols
"4", "¤" and "$" depending on the current state.
Keyboard state is administered by the X11 server and has its own mapping table
to determine how it is composed. Each bit in the state is called a modifier and
is toggled by one or more keycodes. Multiple keycodes are required so that e.g.
left and right shift both have the same effect.
There are two special state handling mechanisms that deviate from the above
1. dead keys
This is handled by every application (again, transparently by libX11). libX11
has a list of "compositions" which it uses for two things:
- It tells the application which events to ignore (using XFilterEvent()).
- It inserts a fake press event with the composed character, using keycode 0.
E.g. pressing ~ (dead tilde) and n will generate:
1. press of dead tilde (ignored)
2. release of dead tilde
3. press of n (ignored)
4. press of ñ (faked)
5. release of n
2. Lock keys (num lock, caps lock, etc.)
First of all, libX11 treats these a bit special in that it not only looks at
the modifier, but also which keysym (note keysym, not keycode!) is bound to
that modifier. The ones that have special meaning are:
- Caps_Lock, ISO_Lock or Shift_Lock bound to modifier Lock
- Mode_Switch or Num_Lock bound to any of mod1 through mod5.
For all the *_Lock keysyms there is another peculiarity in that they are
toggled with every key press, not turned on by press and off by release like
most modifiers. This is special voodoo that is hard coded for those specific
keysyms and is done in dix/getevents.c.
The new kid on the block is XKB, the X Keyboard Extension. Currently we have no
support for this in our VNC, but its been the standard system on desktops for
ages so we need to get there eventually.
XKB basically replaces everything excepts the keycodes, which is still how
keyboard events are sent out by the X11 server. XKB uses concepts similar to
the old keysym mapping and modifier system but extends them and brings some new
- The system for translating a keycode to a keysym with regard to state is now
more complex with the ability to look at more than shift state and mode switch.
XKB uses a concept called groups for the different alternative keysyms for each
- Applications that want to use the first keyboard model (buttons, not
symbols) now have a hardware independet way to do this as XKB provides a
mapping from keycodes to abstract "key names".
- XKB provides a lot of extra metadata like the physical layout of keyboards.
- XKB also adds some more tweakability of the X11 keyboard handling, like
providing applications the ability to individually turn on and off auto repeat.
A full description of XKB and how it affects us will have to wait until someone
here fully undestands it. Until then the wikipedia article gives a decent
Now for the common problems with keyboards.
1. Looking at keycodes.
As mentioned, keycodes are not standardised, are hardware dependent and
generally shouldn't be used directly. Unfortunately some applications do as
they have a need for the first keyboard input model. This can be games or
applications that need to interface with some other system (like wine, rdesktop
or vmware) and physical key presses is the only interface available.
The reason these programs have done this even though in theory it shouldn't
work is because of the monoculture on PCs. The only X11 server available was
Xfree86 and the only keyboard was the standard PC AT keyboard. This meant that
keycodes were stable on that platform and people started assuming that they
would always be stable.
On VNC this causes problems as VNC normally doesn't have any reliable mapping
for keycodes and dynamically set up mappings for keysyms. The workaround has
been to boot strap VNC with the PC AT keyboard mapping. Unfortunately there is
one mapping per keyboard layout. Although they share a lot of keys, they aren't
identical. So to implement this workaround fully, we'd have to load every
existing PC AT keyboard mapping into VNC and let the user pick the one that
matches the local keyboard on the client.
A few years back, this started breaking on normal desktops as well as Xorg
started moving away from the PC AT keyboard interface and on to the Linux input
abstraction. Unfortunately this new monoculture is even stronger than the last
one as the whole point of the Linux input system is to give applications a
consistent key mapping no matter the hardware. It is therefore likely that the
silliness of looking at keycodes will continue.
The proper solution for programs wanting the first input model is of course to
switch to XKB where there is a standardised hardware independent abstraction
for physical keys. It unfortunately means we need to redesign VNC to deal with
XKB properly first though.
2. Remote desktops
Because of the way the keyboard interface is designed for applications, it is
more or less impossible to create remote desktop systems and get them 100%
correct. The core issue is that you can either transfer the first input model,
or the second one, but not both.
Example A, transferring the first model (RDP):
Pressing the button right of L registers as the same button pressed both on the
local and the remote system. However the local system displays "ö" and the
remote system displays ";". This is because they do not share the same view of
how to map physical keys to symbols.
Example B, transferring the second model (VNC):
Pressing ";" shows up as the same symbol both locally and remotely. However
locally the application will see the ";" button pressed and released. The
remote applications however will see a faked Shift press, a faked "," press, a
faked "," release and a faked Shift release. Again, this is caused by mappings
If you assume that both systems are identical in infrastructure (e.g. both X11
with XKB), then it could be possible to synchronise the systems and get
consistent behaviour. But VNC is supposed to be cross platform, and if you look
at something like rdesktop which has to map the X11 keyboard model to the
Windows one, then it's soon obvious that solving this problem for the general
case is impossible.
Bug 27 and bug 342 are about incompatibilities between VNC and wine/Crossover.
This was caused by the fact that wine was assuming the standard PC AT keyboard.
Fixed by bootstrapping VNC with a Swedish keymap.
The current state of wine seems to be a lot better then how it was initially
described to me. They do look at keycodes, but they do not make any direct
assumptions on these matching any standard driver (like Xfree86 or evdev).
The system works as follows:
Wine has two different systems for dealing with keys. The "main area" of the
keyboard, containing 1-0, a-z and the smaller keys around them, are dealt with
using a mapping from keycode to windows scan codes and vkeys. Every other key
is more properly dealt with and a pure keysym basis.
The mapping system is set up by using several built in mappings and trying to
figure out which one most closely matches the keyboard presented in X11.
For each keycode, wine checks the basic keysym and the shifted one and tries to
find that pair in the mapping currently being evaluated. Every hit means a
point for that mapping. The mapping with the most points is selected as the
most likely layout to work.
Given that wine tries to achieve perfect translation of both keyboard input
models at the same time (see section 2 in comment 5), this method is probably
as good as it gets. It does rely on two assumptions though:
1. The keyboard layout will be more or less identical to one of wine's built in
layouts. The keycodes can be shuffled around, but for every key in wine's
layout there must be a key in X11 that does the same thing.
2. The keyboard layout is fully setup when wine starts and doesn't change.
It's probably assumption 2 that is causing all the problems for VNC. Since the
mappings are allocated as needed, most of the keyboard will be missing when
wine starts. This means that it has very low odds of picking a suitable
Right now we solve this by having a swedish mapping by default. We could also
solve this more generically if we could transfer the relevant portion of the
keyboard off the client somehow.
Bug 400 is about making sure that the Num Lock state on the VNC server is the
same as on the client. Since things like libX11 and wine do special processing
based on this state, it is something that would avoid a lot of bugs.
Bug 1919 and bug 2653 is somewhat related to the issue in that they detail the
issue of Num Lock having different semantics on X11 and Windows.
Bug 1973 is a natural consequence of the workaround mentioned in comment #5.
The Swedish keymap used to bootstrap VNC will only solve the bugs for Swedish
Bug 1982 is about wine's keyboard handling mentioned in comment 7.
Bug 1983 is another issue with wine's keyboard handling, this time with how it
deals with "non-dead" keys.
This bug might no longer be relevant given the new behaviour of wine.
Bug 2447 is a stop gap measure to bug 1973 to add just the brazilian keymap.
Bug 2493 is about fixing the VNC client CotVNC, which basically tried to do the
first keyboard model even though VNC is designed for the second. Worked fine
for CotVNC client to CotVNC server, but not very well with other (proper) VNC
Bug 3511 details a new variation on an old theme, monoculture making
applications assume certain things that aren't always true.
In this case it is about modifiers. As mentioned in comment 3, some modifiers
are a bit special so the first two (or possibly three) modifiers are if not
mandated, at least heavily implied to look a certain way. mod1 through mod5 has
historically varied a bit though, but the current state is that these should be
- Alt and Meta should be present and be the same modifier.
- Num_Lock shall be present.
- Mode_Switch shall be present.
- The Windows key shall be present.
The last one is a sneaky one because the concensus for which keysym the Windows
key should have has varied. Currently it is Super, but Hyper has been common in
The current monoculture however is Xorg on a PC, and for similar reasons as for
the workaround in comment 5, we need to bootstrap VNC with the expected
"standard" to avoid triggering bugs in applications.
Bug 3522 touches on the subject mentioned in comment 5. Our current workaround
is based on the old Xfree86 PC AT keyboard driver, but the current de facto
standard is the Linux input system. IOW our bootstrapping keyboard mapping is
no longer following that applications are normally exposed to and therefore
might no longer have the desired effect.
As part of bug 3074, it's time to get up close and personal with XKB.
The rationale behind XKB covers a lot of things, but the important changes
relevant for us are:
- Nothing is implict anymore (not quite true, but close enough)
- Symbol generation is now a lot more complex than the Shift/AltGr four column
Some things are still familiar though:
- Clients still receive just the keycode and the state, and do the translation
to a symbol themselves.
- The list of modifiers is the same (although they no longer have implicit
== Key Names ==
In the core protocol, keycodes are raw values from the hardware and should
never _ever_ be interpreted directly by applications (which is conveniently
ignored left and right, see comment 5). There are well known names like <ESC>,
or <AE01> that are then mapped to keycodes.
All other XKB configuration references the names, not the codes, making those
portions hardware independent.
== Actions ==
As mentioned, XKB tries to do away with implicit magic. One big part of this is
the introduction of actions.
In the core protocol you would bind a keycode to the modifier Shift, and that
would implicitly make that keycode toggle the modifier state.
In XKB however, you would explicitly have to bind the _symbol_ Shift_L to the
action SetMods(Shift). If you want a locking behaviour (like caps lock or num
lock), you instead bind it to the action LockMods(). That means that every
modifier can have locking behaviour, and which key does this is completely
Actions are bound to keys the same way as symbols are. So whenever the rest of
the text refers to symbols, it generally also means actions.
Actions are also used for things like "mouse keys", which won't be covered
== Key Types ==
XKB introduces the concept of "key types". It defines the number of symbols (or
"levels") that each key can generate, and how the current set of modifiers
chooses between these symbols. Each key picks its own type, so you cannot
compare the symbol list of two keys unless they have exactly the same type.
The system allows full control over modifier combinations. E.g. if Shift
"cancels" Lock depends on how your type is defined.
In the core protocol, every key would be of the same "type" and have four
levels and respond to Shift and Lock (numpad keys are an exception though).
== Key Map ==
The XKB key map serves the same purpose as it does for the core protocol; it
maps between keycodes (via key names) and symbols. The difference is that now
the type determines how to choose the column.
== Groups ==
XKB also has something called "groups". In principle they behave like
modifiers, but there are only four groups in total and only one group is active
at a given time. For each key, the type and key map is specified independently
for each group.
In practice, groups are used to have multiple layouts loaded at once. So you
can have Swedish and English configured at the same time, and the resulting
symbols is changed by simply changing the active group. This makes things much
more sane compared to having to reload the entire keyboard configuration.
So XKB allows for a lot more flexibility in how symbols are generated, and it
makes life easier for applications that prefer the first input model (keys
rather than symbols).
Unfortunately it makes life very difficult for systems that try to run the
system "in reverse" and figure out which set of keycodes will result in one
specific symbol (e.g. VNC). A proper such implementation would be massively
complex and is probably unfeasible. Some kind of heuristic will have to be
One more thing about XKB:
== Compatibility Maps ==
These are an alternative way of specifying actions, and they seem to be the
Instead of binding an action to a key, you can bind it to a symbol. That way
the action will automatically be moved around as you change your key map.
Generally this is what you want as you want the Shift modifier to be where the
Shift_L/Shift_R symbols are, rather than fixed to specific keys.
Also some minor notes on how XKB handles compatibility with older Core protocol
First of all, when an application uses the old Core API, it will still be using
the XKB protocol and mechanisms. This because libX11 is magically mapping
things to XKB even if you are calling the older functions. The only way to
truly use the Core protocol is to use a libX11 that was compiled without any
XKB support at all. You'll have to find a really old system to encounter
anything like that.
The X server does however try to be compatible with the Core protocol, should
such an old libX11 appear. But unfortunately it fails miserably and rarely
works in practice.
The biggest and most critical bug is in how it generates the Core symbol table.
Since XKB has a much more flexible symbol lookup system, it cannot be
represented with the simple four column table of the Core protocol. It could
generate a table for the symbols when things happen to match (which is fairly
common for most western layouts), but again it fails at even this simple task.
The XKB specification clearly describes how the Core table should be generated
from the XKB layout. But the described algorithm ignores the flexibility of the
XKB system, and makes the assumption that all XKB layout definitions follow [at
least] these two rules:
a) Column 1 and 2 are selected based on the exact same Shift and Lock rules as
specified by the Core protocol, no matter the key type.
b) Column 3 and 4 are what you get when you switch to the second XKB group,
and that group switch is triggered by Mode_Switch.
The first assumption holds somewhat well for most layouts. Lock behaviour is
generally different in XKB layouts compared to the Core protocol, but otherwise
it maps rather well.
The second assumption is however horribly wrong in so many cases. The likely
reason for this silly assumption is that in the Core protocol, column 1/2 and
3/4 are called groups. They are however very different from the four XKB
groups, even if someone at some time during the drafting of XKB had some idea
about similar use.
(There are even more weird aspects of this compatibility algorithm, but these
are enough to screw things up)
The end result of this is that you get a Core symbol map that doesn't behave
anywhere near how the XKB layout describes things. Most people don't notice as
your libX11 will probably be using the XKB stuff directly. But it's important
to know that you cannot trust what tools like "xmodmap" give you.