Pogoplug v4 Performance Tuning Future Enhancements

There are some exciting changes coming to the Linux kernel for both ext4 and xfs file systems that have the potential to greatly increase I/O performance on the Pogoplug v4. As we know, one of the limits of the Pogoplug v4 is related to memory operation performance. Once the kernel has been released for linux-kirkwood, I will begin testing these changes for possible inclusion into my performance tuning guide.

I plan to make a posting about using LUKS with the Pogoplug to enhance security on your data. There is hardware crypto with the kirkwood that does work, but the mv_cesa driver has some design issues that affect performance on the kirkwood due to not using DMA. Recall the aforementioned issues with memory operations on this platform.

There is some evidence that the DMA issue with mv_cesa may be fixed in the future.  Stay tuned and if/when that comes, I’ll be sure to test it.  If they successfully implement DMA in mv_cesa, this should greatly increase throughput of the hardware crypto engine, speeding up LUKS, OpenSSL, and OpenSSH by extension.

Advertisements

10 thoughts on “Pogoplug v4 Performance Tuning Future Enhancements

  1. Would you use a different filesystem by SSD or a pendrive because of durability issues? I so, can you please write more about this as well? When will be the new kernel released?

    Like

    • There are file systems for flash, such as F2FS, which might be something you would try. That said, SSDs generally have intelligence and features like TRIM and other techniques to help with wear leveling. You often need to enable discard for file systems and lvm to enable trim support. USB flash drives usually have very basic controllers and do not perform any wear leveling, and same with flash memory cards. Flash does require proper optimization, such as partition alignment and block sizes. Your performance can be greatly diminished by having an improperly aligned file system on flash media.

      As far as the new kernel, I am waiting on 4.2. Once ARCH releases this for Kirkwood, I will be testing some new capabilities to see if they can be used to improve I/O performance. I am not sure if the mv_cesa driver DMA changes were merged for 4.2, so I will need to look at that.

      Like

      • I did some research yesterday. Better consumer grade USB flash drives and SD cards can have similar or even better features than an average SSD. E.g. my Kingston V300 120GB SSD has 150/130 MB/s seq read and write speeds and using MLC. A newer external hard drive, like Verbatim USB 3.0 External SSD has 450/300 MB/s speeds and contains MLC. So let’s compare them with pendrives and SD cards. By endurance I am not sure whether they can be as good as an SSD (SD cards certainly not), but every Kingston and SanDisk pendrive support wear leveling, according to their manual and the same is true by SanDisk SD cards since 2007. So I think the software is not an issue here, but the hardware. These consumer grade pendrives usually contain MLC, while the real expensive ones SLC, so you can buy endurance if you have at least double the money for the product. SD cards aren’t such durable, I think they have less electronics for that, but I am not an expert in the topic. Most of the reviews measure only read/write speed, which is very good for some of them. Some examples by the pendrives: Transcend JetFlash 780 – 210/75 MB/s, Kingston DataTraveler HyperX 3.0 – 225/135 MB/s, Sandisk (Cruzer) Extreme – 250/100 MB/s, SanDisk Extreme PRO – 270/250 MB/s (maybe SLC), Kingston DT Workspace – 250/250 MB/s, Lexar JumpDrive Triton – 180/190 MB/s (the 32GB model they sent to the reviews was SLC, but it is uncertain whether they used SLC or MLC by the manufactories), etc… By the SD cards I checked only Sandisk Extreme Pro SDHC, which has about 100/90 MB/s speed, so it is very good for an SD card. According to some forums it uses 24nm SLC, but I haven’t found any proof of that. According to you an SD card is not a good choice by PP4. I checked the prices, under 64GB a pendrives have a better price and over 64 GB a SSDs have better price. So if you need a lot of storage place, than it is better to buy an SSD. Only if you need a small portable solution buy a pendrive in that case.

        Thanks for the suggestions about the SSD. I’ll wait for the 4.2 kernel before buying. If you manage to tune the PP4 over 100/60 MB/s, than I’ll buy one. If not I am afraid I have to spend more money on a Zyxel NAS, which has proper I/O. I look forward to your next article! Have a nice day! 🙂

        Like

      • Sorry for the flood but I cannot edit. Am I right that this “mv_cesa driver DMA changes” affects only the speed of encryption/decryption and won’t increase the low I/O speed (70MB/s read, 40MB/s write) we experience by PP4?

        Like

      • So there are two sets of changes I am looking forward to. The first set of changes is some additional file system options that should be part of 4.2, which I might be able to utilize to provide better throughput from XFS and EXT4 file systems for most file operations. Again, to be determined, but it looks promising. The second set of changes is to the mv_cesa driver which, as you says, is the driver for the hardware encryption/decryption engine on the SoC, which should help speed up operations dramatically for encryption. With security a bigger issue in the world than it ever has been, LUKS would be a great option for my Pogoplug article, but the problem has been LUKS is limited to about 5-10MB/s maximum throughput due to the overhead associated with not using DMA in the mv_cesa driver. So you are correct, they are two different things. I suspect what you are after is primarily better standard I/O performance for file operations, not so much the encryption stuff.

        Like

      • Yepp, currently PP4 has more like USB2 speed than SATA2 and USB3, which I don’t like. Hopefully the new kernel will be a huge improvement.

        If not I guess only CPU overclocking would help, but as far as I understand that is really hard to do by these devices without high skills in electronics. There is a small hope for overclocking with software if I understand this forum well: http://archlinuxarm.org/forum/viewtopic.php?f=58&t=7037 And ofc. we know nothing about how would CPU overclocking affect the lifetime of the device.

        Like

      • The PP4 actually does operate well beyond USB 2 throughput, but it depends on what type of operation you’re referring to. As noted in the article, I can read at over 100MiB/s, but this doesn’t translate into 100MiB/s over CIFS with SAMBA, because of other overhead and some performance issues with memory operations I’m looking to bypass using the 4.2 kernel features. Real world, I am getting over 42MiB/s read with SAMBA and 26MiB/s write. 42MiB/s is above the real-world performance people will get out of a USB 2.0 device. If I am able to utilize these features to reduce unnecessary memory operations related to file system I/O, I should be able to improve upon this even further. These numbers we have right now were a huge improvement over stock performance. The main thing to keep in mind with the PP4 is that we’re talking about an ARMv5te 800MHz SoC with 128MiB of 400MHz DDR2 memory. The fact that we can achieve this level of performance and have all of these features, for $18 or so, is pretty amazing. Undoubtedly one can get a faster SoC with faster (and more) memory, but there aren’t many that have gigabit ethernet and USB 3.0 until you get to much more expensive devices. There are some NAS versions of this kirkwood SoC that do have more memory (sometimes up to 512MiB) and a 1600MHz processor. That might be what the Zyxel you refer to has, but yeah, more money. Anyway, please check back again once 4.2 is released and the Arch team builds the new kernel for kirkwood. Once I see this, I’ll try to set aside some time to begin testing it right away.

        Like

    • The ZyXEL products are dedicated NAS solutions. Interestingly the cheapest ZyXEL (price about 3x as PP4) has a similar architecture and the same speed problem as the PP4. http://www.bit-tech.net/hardware/storage/2012/01/25/zyxel-nsa310-review/1 I think it is possible to tune them the same way you do with PP4, but they have 1.2GHz Kirkwood CPU and 256MB RAM. According to Alberto it is: https://linuxengineering.wordpress.com/2014/08/03/performance-tuning-with-pogoplug-v4/comment-page-1/#comment-67 with a more expensive ZyXEL NSA325. It has about 6x the price as a PP4.

      I was interested in data transfer speed through the gigabit LAN to/from an USB3 or a SATA2 device. I read in the comments that many ppl have 70MB/s read and 40MB/s write without RAID5 (which I don’t want to use). This is under 100MB/s. I guess that values was about the internal read and write to the storage devices, which is fine, but is it possible to have similar speeds through network e.g. with rsync? I intend to use it as a NAS + database (event storage) server for personal usage. Do you think it is enough for that, or should I buy something more expensive? Ofc. I’ll look back, you have great articles! 🙂

      Like

      • So a few thoughts here. Please read my section on software RAID for some considerations on this device. I’d strongly recommend an external USB 3 enclosure with hardware RAID. When you’re talking about low-powered devices, whether that be the 1.2GHz or 800MHz versions, the extra writes associated with software mirroring, or even the extra parity calculations with RAID 5 are going to hinder performance. My guess is that those NAS devices are using software RAID for their SATA ports rather than some hardware RAID chipset. The Pogoplug v4, with USB3, could utilize an external RAID controller, like I have with my IcyDock. Also, disk performance itself comes into play, as I have two different 7200RPM disks, and the high density 2TiB drive has better throughput than the 500GiB one. The 100MiB/s read performance that I was speaking of on my 2TB drive was using O_DIRECT, and most programs do not use O_DIRECT to open files. I do talk about this in my article as well, and what that means to programs like rsync, etc. Still, certain programs, like SAMBA, do have options that enable us to squeeze more throughput out of the device than an average program. This is why I am interested in the 4.2 kernel, because there are some features that may potentially help us bypass some of these limitations associated with how programs open files for reading/writing. In any case, the more interrupts you raise, such as for using both disk and network, the more overhead you have. So programs like rsync that are using disk, network, and doing hashing have additional overhead beyond just a straight disk throughput test. In any case, the features I am looking to utilize in the 4.2 kernel may offer a significant improvement. Anyway, getting back to your question on NAS + database. The database part worries me a little bit, since the Pogoplug v4 has only 128MiB of RAM. So I suppose it sort of depends on how much memory you think the DB would require and how many connections into the NAS device you would have. Also, due to the memory operations being so expensive on this device, it may not perform super well with a database running on it, depending on just how active the database was. The ZyXEL with faster (probably), larger memory, and a faster SoC might work better in that case.

        Like

      • It’s only for personal usage, 4-5 clients with a few parallel calls, not something big… I hope it will be able to handle it. Anyways I’ll buy a PogoPlug Mobile, which is cheaper, but have (almost) the same architecture, I’ll test some databases on that.

        I think postgres and some simple noSQL databases will be able to run on low memory. Certainly not neo4j. 😀 If memory is a problem, I think I’ll add only event storages to the server and do the HTTP server, query databases, etc. part on the actual client machine. Other possible options to scale out this system to multiple small PogoPlug servers, or scale up to a more expensive server, like the ZyXEL NSA325. Scaling out the system would be much more interesting to me, I never did something like that.

        I assume if we keep it simple, from the 4.2 the kernel will use this O_DIRECT thing anyways, so it won’t matter that the applications try to use the slower I/O, they will get the fast one with 100MB/s. That would be a great news! 🙂

        I might buy a hardware RAID later, but currently I need money for something else, so I’ll do automatic backup instead. Anyways, this is just a hobby project, not something serious. Thanks for the advice! 🙂

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s