My words on free/open source software

Saturday, November 05, 2016

Dell Precision 15 7510 Running CentOS 7.5

I set up my new laptop/mobile workstation. It's a certified refurbished Dell Precision 15 7510 from eBay. I'm pretty satisfied with the hardware as it looked brand new to me, although FedEx's shipping caused me some confusion and trouble. I installed CentOS 7.2 and am happy with the results.

Hardware

I purchased this latest Skylate model because I checked dell.com and was sure that it has pretty good Ubuntu support. The following is a summary of this laptop's hardware:

  • 6th Generation Intel Core i7-6820HQ (Quad Core 2.70GHz, 3.60GHz Turbo, 8MB 45W, w/Intel HD Graphics 530)
  • Graphics: 4GB Nvidia Quadro M2000M
  • 15.6" UltraSharp UHD IGZO (3840x2160) Wide View Anti-Glare LED-backlit
  • HD1: 512GB M.2 Samsung PCIe NVMe Class 40 Solid State Drive
  • HD2: SK hynix SC210 2.5 7MM 512GB
  • Intel Dual-Band Wireless-AC 8260 Wi-Fi with Bluetooth 4.1 Wireless Card (2x2)
  • Webcam: 0x1bcf Sunplus Innovation Technology Inc.

Storage Performance: RAID-0 slower than single disk?

The new laptop has two drives:

  1. 512 GB M.2 PCIe NVMe Class 40 Solid State Drive
  2. 512 GB 2.5 inch High Performance SATA Class 30 Solid State Drive

I'd like my CentOS to run fast so it should be on the NVMe drive. I was curious about whether I can use RAID0 to make a super fast partition so I set up Linux RAID using two partitions of the same size on these two drives and compared its performance with a regular XFS partition on the NVMe drive. The result was a surprise.

First the mkfs.xfs ran very slowly on the RAID0 partition. It took about 30 minutes to finish while when running on a non-RAID partition it needed only a few seconds.

I then run fio using the ssd-test.fio workload. I changed the run time to 300 seconds to get more accurate numbers. The results are:

The RAID-0 chunk size was 128 kB. File system was XFS with 4096 sector size. You can see that, without encryption, RAID-0 is actually slower than NVMe in every workload, and LUKS encryption has a pretty big impact on I/O throughput.

Setting It Up

  1. Updated BIOS to the latest version. This can be done in Windows or using a FreeDOS USB if your Windows is gone.
  2. Disabled SecureBoot in BIOS. It is possible to install CentOS with SecureBoot but I don't want to deal with it right now.
  3. Disabled Intel Rapid Storage Technology and switched to AHCI mode for SATA. I tried accessing Intel RST disk using a Fedora 25 Beta LiveCD, it still can't be recognized by the 4.8.0 kernel.
  4. Disabled built-in Intel graphics and using nVIDIA only, because there's no easy way to do live switching between graphics card with X yet.
  5. Installed all Windows 10 Pro updates and cleaned up unnecessary bits to reduce the Windows footprint
    1. Uninstall unneeded software: Intel management stuff, trialware if there are any.
    2. Turn off unneeded Windows feature (can be found on the Uninstall control panel module).
    3. Disable hibernation: powercfg /hibernate off (as administrator) to free up around 32 GB (your RAM size).
    4. Reduce page file size if needed.
    5. Install WinDirStat and see if there are other useless files.
    6. Disk cleanup -> System cleanup -> More options -> Cleanup old restore points.
    7. Run a full system disk cleanup.
  6. Created a Windows recovery USB (16 GB is necessary) in case I need to recover it later.
  7. Luckily the pre-shipped Windows 10 Pro is on the SATA drive so I have the whole NVMe drive for CentOS. I used gparted and shrunk the Windows partition so I can save more things on the SATA drive using CentOS. There are around 3 or 4 other tiny partitions on the SATA drive for UEFI booting and I just left them alone.
  8. Ran memcheck to make sure the RAM is in good shape (it also tested that my machine isn't vulnerable to the Row hammer attack).
  9. Boot up a CentOS 7 Live USB.
  10. Created partitions on /dev/nvme0n1 using fdisk because I don't know how Anaconda aligns partitions.
    • nvme0n1p1: EFI partition (150 MB), nvme0n1p2: /boot (2 GB), nvme0n1p3: / (70 GB), nvme0n1p4: swap (16 GB), nvme0n1p5: /home (389 GB)
    • I was overcautious and made sure all partitions are 4 MiB aligned using fdisk (parted also has auto-alignment checking but I couldn't figure out what align size parted was using).
    • Created LUKS partitions and made XFS file systems on top of them. I don't need LVM and wasn't sure if I could disable LVM in Anaconda so I just created the LUKS partition and file systems myself. I used "mkfs.xfs -s size=4096" to create XFS with 4096 sector size.
    • I don't know if this is correct but NVMe reports its physical sector size as 512 (/sys/block/nvme0n1/queue/physical_block_size) so mkfs.xfs creates file systems with 512 sector size. I guess this is actually harmless but I'm just a little paranoid about using 4096 sector size since all my other SSDs are reporting 4096 sector size. I created the /home partition myself and told Anaconda not to format it. But Anaconda insisted that the / partition must be formatted by itself and didn't provide an easy way to pass parameters to mkfs and it. What I did is to rename /usr/sbin/mkfs.xfs to /usr/sbin/mkfs.xfs.orig and create a script in place to always pass -s size=4096 to mkfs.xfs. After installation I checked using xfs_info and all newly created XFS partitions are using 4096 sector size.
  11. After the installation finished, the system wouldn't boot. efibootmgr correctly created a new entry "CentOS Linux" that points to /EFI/somedrive/centos/shim.efi. However, it pointed to the WRONG drive. In my case, it pointed to the SATA drive that had a Windows partition instead of the NVMe drive that hosted the Linux UEFI partition. This is very likely a bug in Anaconda, which just uses whichever UEFI partition it detected first and assumed the system has only one UEFI partition. Finding the cause wasted me some time but once you found the reason fixing is easy: just boot into BIOS and create a new UEFI boot entry that points to the right shim.efi on the right drive. Dell's BIOS is pretty powerful and can do all sorts of UEFI/legacy booting mode mingled together.
  12. Installed the kernel-ml package since CentOS 7.2 doesn't support Skylake yet. CentOS 7.3 and later has Skylake support out-of-the-box.
  13. HiDPI (my laptop has a 4K internal display) works great after installing the mainline kernel.
  14. Added rd.luks.options=discard to /etc/default/grub and discard to /etc/crypttab to enable LUKS discard support. This enables the LUKS device mapper layer to pass along the discard/trim command from the file system above and can help to keep the flash drives run at optimal performance. The downside is this can make the LUKS encryption less secure by leaking locations of the unused blocks (which is not a big deal for me).
  15. I didn't mount XFS with discard, because that slows down daily operation and the XFS FAQ warns against that. Instead I just enabled weekly fstrim timer in systemd by running: systemctl enable fstrim.timer
  16. Installed iwl7260-firmware-25.30.13.0-68.fc25.noarch from Fedora 25 to provide the firmware for the Intel 8260 wireless card.
  17. Installed nVIDIA Proprietary driver 370.28 (latest version).
  18. Connected my external low DPI monitor. By default everything looked huge on it because of the 2.0 scaling factor GNOME defaults to on the internal monitor. I used the nvidia-settings tool and enabled panning and changed scaling on the external monitor. The external monitor's resolution is 1920x1200, so I enabled panning 3840x2400 and set ViewPortIn to 3840x2400. After that the external screen became blank because of a bug so I also needed to run this command to enable "ForceFullCompositionPipeline=On":
    nvidia-settings --assign CurrentMetaMode="DPY-5: nvidia-auto-select @3840x2160 +0+0 {ViewPortIn=3840x2160, ViewPortOut=3840x2160+0+0}, DPY-2: nvidia-auto-select @3840x2400 +3840+0 {ViewPortIn=3840x2400, ViewPortOut=1920x1200+0+0, ForceFullCompositionPipeline=On}"
  19. Used PowerTOP to check that the CPU can actually enter deep sleeping mode (C10). If not your CPU might be dying faster.
              Package   |             Core    |            CPU 0    CPU 4
    POLL        0.0%    | POLL        0.0%    | POLL        0.0%    0.0 ms  0.0%    0.0 ms
    C1E-SKL     0.8%    | C1E-SKL     1.3%    | C1E-SKL     1.1%    0.4 ms  1.6%    3.2 ms
    C3-SKL      0.4%    | C3-SKL   1.1%    | C3-SKL 2.1%    0.4 ms  0.1%    0.4 ms
    C6-SKL      7.9%    | C6-SKL     13.3%    | C6-SKL     25.0%    0.7 ms  1.7%    1.1 ms
    C7s-SKL     0.0%    | C7s-SKL     0.0%    | C7s-SKL     0.0%    0.0 ms  0.0%    0.0 ms
    C8-SKL     29.0%    | C8-SKL     31.0%    | C8-SKL     43.4%    1.6 ms 18.6%    3.3 ms
    C9-SKL      0.0%    | C9-SKL   0.0%    | C9-SKL 0.0%    0.0 ms  0.0%    0.0 ms
    C10-SKL    51.0%    | C10-SKL    44.0%    | C10-SKL    17.0%    5.1 ms 71.0%   16.9 ms
    
  20. Updated on Nov. 24th, 2016: I discovered that the headphone jack had no sound. This is fixed by adding a file /etc/modprobe.d/alsa-base.conf with the following content:
    options snd-pcsp index=-2
    alias snd-card-0 snd-hda-intel
    alias sound-slot-0 snd-hda-intel
    options snd-hda-intel model=laptop
    options snd-hda-intel position_fix=1 enable=yes
    
    I heard that you will need to completely shutdown and restart the laptop to make this work. I got the instruction from here.
  21. Brightness of internal display is not remembered across reboot. Per Soon's comment, you can set a fixed value on boot by adding:
    echo 100 > /sys/class/backlight/acpi_video0/brightness
    chmod 666 /sys/class/backlight/acpi_video0/brightness
    
    to /etc/rc.d/rc.local, and then chmod +x /etc/rc.d/rc.local.
  22. Updated on Apr. 25th, 2017: Setting the screen backlight brightness to max when working in direct sun. CentOS 7.3's kernel defaults to using ACPI backlight control. It works great but cannot set the screen brightness to the maximum possible level, i.e., it's max brightness level is a little darker than the max level the hardware can reach. For now, in order to reach the max brightness level you can do a reboot and set the brightness level to max in BIOS (and disable any change to the brightness level you added such as the one above). After the system boots up, if you change the brightness level you wouldn't be able to come back to max again. The other workaround is to add the acpi_backlight=vendor kernel option, which switches to using the dell_backlight driver. The dell_backlight driver, however, is buggy; it would fix the brightness at the max level and you wouldn't be able to change it.

End Result

Everything else works well, including suspend-to-RAM, external display, HDMI, DisplayPort, webcam, sound, Wi-Fi, SD Card Reader, and USB-C.

Some nitpicks

  • Brightness of internal display is not remembered across reboot.
  • Sometimes after suspend to RAM, the CPU frequency scaling doesn't work well and all CPUs are stuck at around 400 to 500 MHz, like:
    analyzing CPU 0:
      driver: intel_pstate
      CPUs which run at the same hardware frequency: 0
      CPUs which need to have their frequency coordinated by software: 0
      maximum transition latency: 0.97 ms.
      hardware limits: 800 MHz - 3.60 GHz
      available cpufreq governors: performance, powersave
      current policy: frequency should be within 800 MHz and 3.60 GHz.
                      The governor "powersave" may decide which speed to use
                      within this range.
      current CPU frequency is 468 MHz.
      boost state support:
        Supported: yes
        Active: yes
    
    It doesn't happen most of the times, and I haven't figured out why. If I feel the machine is sluggish I just run sudo cpupower frequency-set --governor performance (with it binded to a hotkey). There are some other people having the same issue.
  • I wish the battery could last longer. I don't have time to do any manual tuning yet. Now when on battery I can get like 4 hours regular use or 1 hour heavy use (running my CPU heavy machine learning code). This is on par with an early review of this laptop running Windows. I am pretty sure it can do better if I do some power usage tuning in PowerTOP.
  • Live switching between nVIDIA and Intel graphics card not working but I'm ok with using nVIDIA only since the Intel Skylake driver is buggy anyway. The downside is probably more power consumption.

Some random thought. This new laptop's nVIDIA graphics feels much faster than the Intel graphics in my old Late 2013 Retina MacBook Pro. I'm just talking 2D performance here. The old Intel graphics in the rMBP always stutters when driving one external monitor and crashes when I add two external monitors. It drives two external monitors fine when running macOS so this is likely a software issue. I can't help but think that Intel doesn't allocate enough resources for their Linux drivers. There are other problems with Intel's driver:
  • No easy way to install different versions on RHEL/CentOS. The official installer doesn't support RHEL/CentOS yet. The nVIDIA driver, while being closed source, always support most Linux distros on the market and I can install any version I choose.
  • Buggy and incomplete 2D acceleration support.
  • 3D performance much slower than Windows driver.
  • The Xorg running on my old rMBP had an about 20% chance crashing when plugging in an external monitor. I don't know if I should blame the Xorg server or the Intel driver but with the same Xorg using nVIDIA driver in the new laptop, it has never crashed once yet.
Even though I hail Intel's engagement with the open source community and their open source drivers, I feel that the quality of their drivers is not on par with nVIDIA's proprietary drivers.

Updates

  • Oct. 2018: Upgraded to CentOS 7.5
  • Nov. 2017: Tested USB-C using a Google Pixel 2 XL
  • Dec. 2016: Upgraded to CentOS 7.3

About Me

My photo
Santa Cruz, California, United States