www.cendio.com
Bug 4417 - Upgrade our X server
: Upgrade our X server
Status: CLOSED FIXED
: ThinLinc
VNC
: 3.4.0
: PC Unknown
: P2 Normal
: 4.1.0
Assigned To:
:
:
: 3074
: 2968 3836 4262 4416
  Show dependency treegraph
 
Reported: 2012-10-05 14:18 by
Modified: 2013-05-31 13:01 (History)
Acceptance Criteria:


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description From cendio 2012-10-05 14:18:13
I've managed to get Gnome Shell working under the TigerVNC shipped in Fedora
17:

$ rpm -q xorg-x11-server-Xorg 
xorg-x11-server-Xorg-1.12.3-2.fc17.x86_64

It's slow, but it's a start - Ubuntu doesn't seem to build their VNC server
with the composite extension enabled, so that's a lost cause. Upgrading our X
server should allow us to run at least Gnome Shell, and possibly Unity, via
ThinLinc.
------- Comment #1 From cendio 2012-10-05 14:45:22 -------
Perhaps it's worth noting that I lied a little bit; Xvnc segfaults after a
short time running Gnome Shell. Posting the traceback here for reference.

Backtrace:
0: /usr/bin/Xvnc (xorg_backtrace+0x36) [0x591236]
1: /usr/bin/Xvnc (0x400000+0x194c99) [0x594c99]
2: /lib64/libpthread.so.0 (0x3083c00000+0xefe0) [0x3083c0efe0]
3: /usr/bin/Xvnc (_ZN11InputDevice8keyEventEjb+0x46) [0x506ca6]
4: /usr/bin/Xvnc (_ZN3rfb16VNCSConnectionST8keyEventEjb+0x100) [0x525f70]
5: /usr/bin/Xvnc (_ZN3rfb16VNCSConnectionST15processMessagesEv+0x38) [0x5252d8]
6: /usr/bin/Xvnc (_ZN14XserverDesktop13wakeupHandlerEP6fd_seti+0x177)
[0x505b07]
7: /usr/bin/Xvnc (0x400000+0xfc49c) [0x4fc49c]
8: /usr/bin/Xvnc (WakeupHandler+0x6b) [0x54842b]
9: /usr/bin/Xvnc (WaitForSomething+0x1a6) [0x58ed46]
10: /usr/bin/Xvnc (Dispatch+0xa1) [0x544211]
11: /usr/bin/Xvnc (main+0x35a) [0x441c5a]
12: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x3083821735]
13: /usr/bin/Xvnc (0x400000+0x4309d) [0x44309d]

Segmentation fault at address 0xb8

Fatal server error:
Caught signal 11 (Segmentation fault). Server aborting
------- Comment #2 From cendio 2012-10-05 15:14:17 -------
(In reply to comment #1)
> Perhaps it's worth noting that I lied a little bit; Xvnc segfaults after a
> short time running Gnome Shell.

https://bugzilla.redhat.com/show_bug.cgi?id=863431
------- Comment #3 From cendio 2012-11-20 16:21:17 -------
Should be easy since Fedora has already made sure Xvnc works with the latest
Xorg. It's still a lot of menial work upgrading all the dependencies though.

We cannot move all the dependencies into the build system because we have an
ancient libX11 on Solaris (which messes with the builds of other packages). But
maybe we can move some of the packages in there.
------- Comment #4 From cendio 2013-03-22 16:10:53 -------
I've got a working 1.14 in my working copy now. Going back to look at XKB for a
while.
------- Comment #5 From cendio 2013-04-04 16:44:50 -------
Vendor drop done in r26951. A lot of cleanup and integration work remains
though.
------- Comment #6 From cendio 2013-04-05 23:56:43 -------
*** Bug 3616 has been marked as a duplicate of this bug. ***
------- Comment #7 From cendio 2013-04-09 18:39:27 -------
As part of this I discovered that we weren't getting proper log output. So now
I've added -verbose to our Xvnc (stolen from Xorg) and set xserver_args to
"-verbose 3", which is Xorg's default.

Tester should verify that we get some X log lines (e.g. "(II)...") in
xinit.log.
------- Comment #8 From cendio 2013-04-10 17:01:25 -------
The nightly build now has Xorg 1.14 and Mesa 9.1.1. New scripts and integration
for XKB will be handled on bug 3074.

Tester should verify as much X11 functionality as possible. GLX also need to be
tested, both direct and indirect. glean/piglet/etc. are probably suitable.
------- Comment #9 From cendio 2013-04-23 14:40:52 -------
Broken on Solaris:

ld.so.1: Xvnc: allvarligt: libXau.so.6: öppningen avbröts: Det finns ingen fil
eller katalog med det namnet
ld.so.1: Xvnc: allvarligt: libfreetype.so.6: öppningen avbröts: Det finns ingen
fil eller katalog med det namnet
------- Comment #10 From cendio 2013-04-23 15:03:32 -------
(In reply to comment #9)
> Broken on Solaris:
> 
> ld.so.1: Xvnc: allvarligt: libXau.so.6: öppningen avbröts: Det finns ingen fil
> eller katalog med det namnet
> ld.so.1: Xvnc: allvarligt: libfreetype.so.6: öppningen avbröts: Det finns ingen
> fil eller katalog med det namnet

Fixed in r27149.
------- Comment #11 From cendio 2013-04-30 14:22:37 -------
With the latest nightly build, I do not get any mouse pointer when starting a
new session, ie in the profile selection dialog. After downgrading to
http://www.cendio.com/downloads/updates/b4547/ it works. 

Perhaps this has something to do with the fact that I'm using some fancy high
color cursors. I activated this a long time ago. I have no idea where this
setting is stored, but it's probably in my home dir somewhere.
------- Comment #12 From cendio 2013-05-02 09:57:52 -------
Libreoffice is exhibiting some redraw problem with its GTK labels (menus,
dialogs, etc.) on at least Fedora 18 and Ubuntu 12.04.
------- Comment #13 From cendio 2013-05-07 13:03:35 -------
(In reply to comment #12)
> Libreoffice is exhibiting some redraw problem with its GTK labels (menus,
> dialogs, etc.) on at least Fedora 18 and Ubuntu 12.04.

This seems to be a somewhat common error:

http://en.libreofficeforum.org/node/3319
http://aptosid.com/index.php?name=PNphpBB2&file=viewtopic&t=2254
http://ask.libreoffice.org/en/question/1429/libreoffice-3512-menu-problem/
http://www.oooforum.org/forum/viewtopic.phtml?t=124957
------- Comment #14 From cendio 2013-05-07 13:16:19 -------
(In reply to comment #12)
> Libreoffice is exhibiting some redraw problem with its GTK labels (menus,
> dialogs, etc.) on at least Fedora 18 and Ubuntu 12.04.

Upstream report:

https://bugs.freedesktop.org/show_bug.cgi?id=57814
------- Comment #15 From cendio 2013-05-07 13:27:08 -------
Seems to be a VNC thing. Taking a screen shot shows all the entries in place.
------- Comment #16 From cendio 2013-05-07 16:28:26 -------
(In reply to comment #15)
> Seems to be a VNC thing. Taking a screen shot shows all the entries in place.

Fixed in r27336.
------- Comment #17 From cendio 2013-05-08 13:28:09 -------
(In reply to comment #11)
> With the latest nightly build, I do not get any mouse pointer when starting a
> new session, ie in the profile selection dialog. After downgrading to
> http://www.cendio.com/downloads/updates/b4547/ it works. 
> 
> Perhaps this has something to do with the fact that I'm using some fancy high
> color cursors. I activated this a long time ago. I have no idea where this
> setting is stored, but it's probably in my home dir somewhere.

The problem was animated cursors. It has now been fixed in r27349.
------- Comment #18 From cendio 2013-05-16 12:12:43 -------
(In reply to comment #16)
> (In reply to comment #15)
> > Seems to be a VNC thing. Taking a screen shot shows all the entries in place.
> 
> Fixed in r27336.

Verified on SLED 11 using build 3945
------- Comment #19 From cendio 2013-05-17 09:19:13 -------
Run the freedesktop xts testsuit using ThinLinc Build 3949 on Ubuntu 12.04.

Xvnc didnt crash an xts completed successfully with following result:

========================
145 of 1007 tests failed
(275 tests were not run)
========================
------- Comment #20 From cendio 2013-05-20 13:41:52 -------
Found when testing Bug 2968. 

On RHEL6, the xserver crashes when trying to run:

LIBGL_ALWAYS_INDIRECT=1 compiz --replace

Core file here: 
/home/astrand/tmp/core.5559.gz
------- Comment #21 From cendio 2013-05-21 08:59:03 -------
(In reply to comment #20)
> Found when testing Bug 2968. 
> 
> On RHEL6, the xserver crashes when trying to run:
> 
> LIBGL_ALWAYS_INDIRECT=1 compiz --replace
> 
> Core file here: 
> /home/astrand/tmp/core.5559.gz

Fixed in r27415 and reported upstream:

https://bugs.freedesktop.org/show_bug.cgi?id=64791
------- Comment #22 From cendio 2013-05-21 09:11:38 -------
*** Bug 4649 has been marked as a duplicate of this bug. ***
------- Comment #23 From cendio 2013-05-21 09:12:42 -------
Handling bug 4649 on this bug, since it is likely a regression caused by the
Xserver upgrade:

Verified on RHEL6, nightly build, all updates. To reproduce:

* Start a new session in window mode

* Switch to full screen. I have two monitors and "all monitors" activated. 

* Switch back to window mode. 

Node, the desktop environment will show a window "Starting up file manager...".
Then there's another one, and another one, etc...  dmesg shows:

__ratelimit: 6 callbacks suppressed
nautilus[15932]: segfault at 8 ip 00007fd519e05c84 sp 00007fffaf239ef0 error 4
in libgnome-desktop-2.so.11.4.2[7fd519dee000+28000]
------- Comment #24 From cendio 2013-05-21 12:38:08 -------
naev (simple 3d space game) glxgear and google earth tested with both
indirect/direct rendering without any problems.
------- Comment #25 From cendio 2013-05-22 13:34:40 -------
(In reply to comment #23)
> Handling bug 4649 on this bug, since it is likely a regression caused by the
> Xserver upgrade:
> 
> Verified on RHEL6, nightly build, all updates. To reproduce:
> 
> * Start a new session in window mode
> 
> * Switch to full screen. I have two monitors and "all monitors" activated. 
> 
> * Switch back to window mode. 
> 
> Node, the desktop environment will show a window "Starting up file manager...".
> Then there's another one, and another one, etc...  dmesg shows:
> 
> __ratelimit: 6 callbacks suppressed
> nautilus[15932]: segfault at 8 ip 00007fd519e05c84 sp 00007fffaf239ef0 error 4
> in libgnome-desktop-2.so.11.4.2[7fd519dee000+28000]

This is messy. First off, the reason it happens now is because of GTK+
requiring RandR 1.3, even though 1.2 would be sufficient for what it wants to
do. That's why we didn't see this in ThinLinc 4.0.0.

Nautilus crashes because it is trying to display a background on the disabled
output/crtc. But since it has no mode it gets 0x0 as the dimensions. This
screws up its internal logic, which results in NULL for the background pixmap
and there subsequent code always expects a valid pointer.

There are two things that our RandR code does differently compared to a "real"
server, and fixing either would make nautilus behave properly:

 a) We always pretend that an output is connected. Nautilus ignores
disconnected outputs (even if they might still be in use).

 b) We clear the mode of the CRTC when "disabling" an output. A "real" server
also disassociates the output from the CRTC.


Fixing a) could be done by automatically toggling the connection state
depending on if an output has a CRTC and/or mode set. This is not quite the
same as what would happen with a "real" server, but we have no external events
to map connection state to, so it might have to be good enough.

Fixing b) is a bit trickier as we need to update the code that is also shared
with libvnc.so. That code would need to become more clever in figuring out a
free CRTC to connect to. Doable, but might not be trivial.
------- Comment #26 From cendio 2013-05-22 13:38:19 -------
(In reply to comment #17)
> 
> The problem was animated cursors. It has now been fixed in r27349.

Broken for older X servers:

r5090 breaks when compiled against Xorg 7.5:

Making all in vnc
  CXX   xf86vncModule.o
  CXX   vncExtInit.o
  CXX   vncHooks.o
  CXX   XserverDesktop.o
vncExtInit.cc: In function 'void vncExtensionInit()':
vncExtInit.cc:197: warning: deprecated conversion from string constant to
'char*'
vncExtInit.cc: In function 'int ProcVncExtListParams(_Client*)':
vncExtInit.cc:762: warning: unused variable 'stuff'
vncExtInit.cc: In function 'int ProcVncExtGetClientCutText(_Client*)':
vncExtInit.cc:857: warning: unused variable 'stuff'
vncExtInit.cc: In function 'int ProcVncExtGetQueryConnect(_Client*)':
vncExtInit.cc:1010: warning: unused variable 'stuff'
vncHooks.cc: In function 'void GlyphRegion(int, _GlyphList*, _Glyph**,
pixman_region16*)':
vncHooks.cc:675: error: 'RegionUninit' was not declared in this scope
vncHooks.cc:697: error: 'RegionInitBoxes' was not declared in this scope
vncHooks.cc: In function 'void vncHooksGlyphs(CARD8, _Picture*, _Picture*,
_PictFormat*, INT16, INT16, int, _GlyphList*, _Glyph**)':
vncHooks.cc:721: error: 'RegionTranslate' was not declared in this scope
vncHooks.cc:728: error: 'RegionInit' was not declared in this scope
vncHooks.cc:730: error: 'RegionIntersect' was not declared in this scope
vncHooks.cc:732: error: 'RegionUninit' was not declared in this scope
vncHooks.cc: In function 'void vncPostScreenResize(_Screen*, Bool)':
vncHooks.cc:784: warning: the address of 'box' will never be NULL
make[3]: *** [libvnccommon_la-vncHooks.lo] Error 1
make[3]: *** Waiting for unfinished jobs....
XserverDesktop.cc: In member function 'virtual unsigned int
XserverDesktop::setScreenLayout(int, int, const rfb::ScreenSet&)':
XserverDesktop.cc:981: warning: 'crtc' may be used uninitialized in this
function
make[2]: *** [all] Error 2
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1
------- Comment #27 From cendio 2013-05-22 14:49:53 -------
(In reply to comment #16)
> (In reply to comment #15)
> > Seems to be a VNC thing. Taking a screen shot shows all the entries in place.
> 
> Fixed in r27336.

Actually it was this that broke things for older Xorg.
------- Comment #28 From cendio 2013-05-22 15:05:59 -------
(In reply to comment #25)
> 
> There are two things that our RandR code does differently compared to a "real"
> server, and fixing either would make nautilus behave properly:
> 

Apparently I had been a bit proactive when I originally did this so it was
easier to fix than expected. Sorted out in r27434.
------- Comment #29 From cendio 2013-05-22 15:06:45 -------
(In reply to comment #27)
> (In reply to comment #16)
> > (In reply to comment #15)
> > > Seems to be a VNC thing. Taking a screen shot shows all the entries in place.
> > 
> > Fixed in r27336.
> 
> Actually it was this that broke things for older Xorg.

Also fixed in r27434. Tester needs to recheck that Libreoffice works.
------- Comment #30 From cendio 2013-05-24 07:02:43 -------
(In reply to comment #29)
> (In reply to comment #27)
> > (In reply to comment #16)
> > > (In reply to comment #15)
> > > > Seems to be a VNC thing. Taking a screen shot shows all the entries in place.
> > > 
> > > Fixed in r27336.
> > 
> > Actually it was this that broke things for older Xorg.
> 
> Also fixed in r27434. Tester needs to recheck that Libreoffice works.

Libreoffice verified
------- Comment #31 From cendio 2013-05-24 08:01:05 -------
First pass running piglet test suit using direct rendering got stuck on a test
after a few hours. But it didnt crash Xvnc, but a few crashes in the piglet
tools along the way.

Second pass running piglet using indirect rendering Xvnc crashes almost
immediately with following backtrace:  

Program received signal SIGSEGV, Segmentation fault.
0x00000000004b4fed in DoGetString (cl=0xdfa578, pc=0x431eff8 "\003\037",
need_swap=0 '\000')
    at single2.c:349
349        string = (const char *) CALL_GetString(GET_DISPATCH(), (name));
(gdb) bt
#0  0x00000000004b4fed in DoGetString (cl=0xdfa578, pc=0x431eff8 "\003\037",
need_swap=0 '\000')
    at single2.c:349
#1  0x00000000004a7c54 in __glXDispatch (client=<optimized out>) at
glxext.c:581
#2  0x000000000056394e in Dispatch () at dispatch.c:432
#3  0x000000000055106a in main (argc=<optimized out>, argv=0x7fffffffe2c8,
envp=<optimized out>)
    at main.c:295
(gdb)
------- Comment #32 From cendio 2013-05-24 14:10:56 -------
(In reply to comment #31)
> 
> Second pass running piglet using indirect rendering Xvnc crashes almost
> immediately with following backtrace:  
> 

The problem is this test:

[Fri May 24 14:07:42 2013] ::  running ::
spec/EXT_packed_depth_stencil/depthstencil-render-miplevels 1024
d=z24_s8_s=z24_s8

And the crash is caused by the GLX function dispatch table being set to NULL.
Probably because of a bug in the test case, but it still shouldn't be able to
crash the server.
------- Comment #33 From cendio 2013-05-27 09:03:20 -------
(In reply to comment #32)
> (In reply to comment #31)
> > 
> > Second pass running piglet using indirect rendering Xvnc crashes almost
> > immediately with following backtrace:  
> > 
> 
> The problem is this test:
> 
> [Fri May 24 14:07:42 2013] ::  running ::
> spec/EXT_packed_depth_stencil/depthstencil-render-miplevels 1024
> d=z24_s8_s=z24_s8
> 
> And the crash is caused by the GLX function dispatch table being set to NULL.
> Probably because of a bug in the test case, but it still shouldn't be able to
> crash the server.

Scratch that. The problem only occurs when you are running multiple OpenGL
programs in parallel (which piglit does by default). Something must be broken
with how the GLX code switches between client contexts.
------- Comment #34 From cendio 2013-05-27 11:27:42 -------
(In reply to comment #33)
> 
> Scratch that. The problem only occurs when you are running multiple OpenGL
> programs in parallel (which piglit does by default). Something must be broken
> with how the GLX code switches between client contexts.

The X server actually had an extra safety net to protect against this scenario.
It was removed some time ago though because "it wasn't needed". Bug filed
upstream:

https://bugs.freedesktop.org/show_bug.cgi?id=65030

Patched back the safety net in r27449. Going to have one more look to see if I
can fix the underlying problem.
------- Comment #35 From cendio 2013-05-28 16:15:36 -------
(In reply to comment #34)
> 
> Patched back the safety net in r27449. Going to have one more look to see if I
> can fix the underlying problem.

Should hopefully be fixed in r27457.
------- Comment #36 From cendio 2013-05-30 10:40:04 -------
(In reply to comment #31)
> First pass running piglet test suit using direct rendering got stuck on a test
> after a few hours. But it didnt crash Xvnc, but a few crashes in the piglet
> tools along the way.
> 
> Second pass running piglet using indirect rendering Xvnc crashes almost
> immediately with following backtrace:  
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x00000000004b4fed in DoGetString (cl=0xdfa578, pc=0x431eff8 "\003\037",
> need_swap=0 '\000')
>     at single2.c:349
> 349        string = (const char *) CALL_GetString(GET_DISPATCH(), (name));
> (gdb) bt
> #0  0x00000000004b4fed in DoGetString (cl=0xdfa578, pc=0x431eff8 "\003\037",
> need_swap=0 '\000')
>     at single2.c:349
> #1  0x00000000004a7c54 in __glXDispatch (client=<optimized out>) at
> glxext.c:581
> #2  0x000000000056394e in Dispatch () at dispatch.c:432
> #3  0x000000000055106a in main (argc=<optimized out>, argv=0x7fffffffe2c8,
> envp=<optimized out>)
>     at main.c:295
> (gdb)

Update my Ubuntu 12.04 with ubild 3966 and restarted a piglet test using
indirect rendering and the test completed without crashing the XVnc.
------- Comment #37 From cendio 2013-05-31 13:01:50 -------
Closing this bug due to there are no more issues found and I'm out of ideas to
test this further.