Proof-of-Concept Mobile LiveStreaming with a Raspberry Pi 4

Early 2023 Edit: since I wrote this post, I discovered the Nvidia Jetson series of systems, and found a real winner with the Jetson Nano. I’ve been meaning to document some of my journey with the Jetson Nano, but haven’t gotten it finished yet, and a few people asked if I was going to update things.

Original Post:

I’m an avid birder, and I enjoy evangelizing birding to basically anyone who will listen. I also like technology, so the intersection of birding and technology is something I especially like. What does this have to do with livestreaming from a Raspberry Pi? That requires a little bit of backstory.

PAX West (formerly PAX Prime, formerly PAX) has traditionally been very bad for cellular connectivity at the main Expo Hall. Having 60,000 people in on spot tends to do that to North American cellular infrastructure. While at PAX South in 2017, I started talking to a Twitch employee about the difficulties of livestreaming in busy environments or in places with marginal service. They liked the idea, and mentioned one of their team was working on it and gave me their card. Unfortunately, my email went unanswered and my interests changed to birding, which I discovered shortly after PAX South.

In the fall of 2018, I started wondering if livestreaming birding would be technically feasible without needing expensive (and heavy) broadcast grade equipment because it seemed like a great way to share a hobby I love with a wider audience. I decided that I’d need at least two cameras, one camera for closeups on bids and a wider camera to show the area, as well as possibly a camera showing what I saw in binoculars. I started doing some research, and found that other people had also had the same sort of idea, GunRun’s IRL Backpack and DiY versions, but the $2,500 (USD) to get a single camera using a LiveU Solo with bandwidth via UnlimitedIRL was way out of my price range, and I’d still need some way to have 2 (or more cameras) connected to it.

Do It Yourself (DiY)

So, what requirements did I settle on?

  • Two cameras: one wide to show the area I’m birding in and one telephoto camera to get nice closeups of birds.
  • 1080p video running at 30fps and up to 6Mb/s bitrate.
  • Ability to switch between the two cameras locally.
  • Not $2,500+.
  • Light enough that I can carry it around with me for a full day of birding.
  • Resilient network connection.

Why these requirement?
Two cameras is obvious: one shoulder, or similar mounted, camera for wide views of the area I’m birding and then a zoom/telephoto camera to get nice, close in views of the birds.
1080p because birding is going to require a lot of fine detail, and 720p just isn’t high enough resolution. 30 FPS is the absolute bare minimum for video to not look jerky, and 6Mb/s is as high as Twitch will ingest.
Being able to switch two cameras locally means that I won’t have to stream 12Mb/s out at all times and worry about switching streams on a remote OBS setup. This will also save on bandwidth costs and not require as much upstream bandwidth.
Low cost is also important, because I definitely didn’t want to spend a bunch of money on this.
Weight is also a factor, as my birding outings can easily last 4 hours and every gram of weight counts.
Resilient network connection because I don’t want to have to stop and restart the stream. It’d also be nice if the stream could go over multiple connections, due to handoffs between towers not being as “seamless” as they’re supposed to be.

The Setup

With this requirements in mind, I decided on the following general hardware setup:

  • Zoom Camera: Nikon P1000. A camera almost purpose built for taking pictures of birds. It has clean HDMI output, which is needed to get the HDMI signal out of it. I already had the camera when I started this. One downside is that the P1000’s screen can not be used in HDMI out mode, requiring an external monitor.
  • Wide/shoulder Camera: GoPro Hero7 Black. Again chosen because I had access to it already. I don’t know if it’s a better choice than the usual Sony Action cam, we’ll see. It also allows 720p streaming, so this is useful as a proof of concept, and I can upgrade to the Hero8 Black‘s 1080p streaming if it works out.
  • Capture Device: Avermedia Live Gamer Mini (GC311). I’d initially looked at various HDMI→USB capture devices, including AverMedia ExtremeCap UVC as well as a bunch of inexpensive AliExpress specials, but I ruled most of them out for not supporting Linux. Anything that supports UVC should support Linux, but I didn’t want to take my chances here due to my limited budget. Why this specific one? It (unofficially) works fine on Linux, it has a build in H.264 encoder capable of 1080p30. I also tested the Avermedia Live Gamer Portable 2 Plus (GC513), which worked about the same, but is significantly larger and heavier, a more awkward shape and draws slightly more power.
  • Encoder, Switcher and Network Control: Raspberry Pi 4. Low power, generally well supported, inexpensive and with a 1080p30 . I’d initially tried the Raspberry Pi 3, but the encoder proved unable to work at higher than roughly 720p24 and 1.5Mb/s, which was well under my minimum requirements.
  • Operating System: Arch Linux ARM. I’m very familiar with Arch Linux, as well as the ARM version, and I suspected that I’d need to build some of the tools from source, which Arch Linux makes (relatively) easy to do.
My P1000 with HDMI monitor, connected to the GC513 capture device running from an RPi4 powered by a battery bank over USB. It works!

Putting it all Together

Getting the capture device recognized by Linux was very straightforward. I plugged it in and connected an HDMI input, and Linux instantly recognized it. Getting useful output out of it was a different matter.

The first problem was that the ffmpeg packages that Arch Linux ARM is using don’t have support for the RPi’s built in video encoders, so I needed to compile the package myself and with some out-of-tree patches (that are due to be merged in soon, hopefully).

With a little lot of trial and error, I managed to get the capture device working with a Raspberry Pi at 1080p30, the maximum the RPi4 is rated for. It just barely keeps up, but potentially trans-rating (like transcoding, except changing the bitrate but not fully re-encoding the video) makes things faster. I managed to run a stream for hours without any dropped frames, other than those caused by WiFi glitches.

I tried to get RTMP running well, but it choked on a cellular connection, and even on a good wired ethernet connection, the latency made it unusable for live, real-time video. One thing I did experiment a lot successfully with is using SRT (Secure Reliable Transport) for streaming. Bonus, SRT will be adding bonding soon, claiming this month (Feb. 2020).

It lives! SRT streaming my dining room wall to VLC on my laptop via my main Linux environment. Dropped frames are from the WiFi, not on the sending end.

Other Things

The streaming on the GoPro Hero 7 Black is unusable. It stops after around an hour no matter what I do and is very annoying to get connected/reconnected, as it needs to use the app. There is a community project to implement an unofficial API, but it doesn’t look like it’ll fix the streaming issue. According to GoPro support it’s a hardware issue, which they won’t fix, instead telling me to buy a Hero 8 Black to replace my broken, but in-warranty 7. Boo.

HDMI switchers, meant to switch in puts on TVs are a bad idea. I bought one model that wasn’t the cheapest, but at best I got about a half second of black screen when it switched over, and at worst, it locked up the HDMI capture device necessitating a replug of the USB, or even needing to power-cycle the RPi4. I looked at higher-end HDMI switchers, but didn’t really find anything mobile, and definitely didn’t find anything inexpensive.

One, Two, N…

There’s a saying in computer circles that getting two of something working is just as much work as getting one of something working. And once you have two working, it as much work to generalize to n.

And it doesn’t have enough decoding power on the RPi4 to decode two streams, switch between them and re-encode them. It just can’t handle that.

Next I tried a custom build of ffmpeg that allows switching between two streams, something stock ffmpeg doesn’t really support. The results were, now that I’m writing this in hindsight, entirely expected. Garbled, unwatchable video for a few seconds until the next I-frame shows up due to smashing unrelated video frames together.

No More Raspberry Pi?

It’s painful to spend so much time on something and realize that it just won’t work out, but between the issues with the RPi4’s encoder just not being powerful enough to do two stream, and a major new issue I uncovered, I’m looking at alternatives.

What major new issue? Well, I was wondering if I could send 2x1080p30 streams at once and calculated how much the data was going to cost me. We’re talking in excess of $80 per hour at Canadian cell phone rates. Oof, so that’s not viable, but even $40 an hour is getting expensive.

I realized that H.264 is years old at this point, and the industry has generally started using H.265 (AKA HEVC). It claims double the efficiency, or better, so I should be able to target a 3Mb/s bitrate for the same quality as H.264’s 6Mb/s. With that in mind I’m trying to find SBCs that support H.265 encoding and have enough decoding performance to work with two streams. It might also be easier to get two raw capture devices, such as Elgato’s Camlink 4k, but this could require a beefy system to do.

I’ve looked at hundreds of boards at this point. Boards based on the RK3288 may work, but the driver situation looks awful. Another potential option is the LattePanda Delta, but it looks too hot and power hungry to be a good contender. There are other similar systems, such as the Atomic Pi, but they look like they may have similar issues with power usage.

So that’s where things are right now. I’m a bit stuck. Sorry RPi4, you’re good for a lot of things, but video isn’t one of them.

So close and yet so far…

Advertisement

Raspberry Pi Ceph Cluster – Testing Part 2

Having built a Ceph cluster with three Raspberry Pi 3 B+s and unsuccessfully tested it with RBD, it’s time to try CephFS and Rados Gateway. The cluster setup post is here and the test setup and RBD results post is here. Given how poorly RBD performed, I’m expecting similar poor performance from CephFS. Using the object storage gateway might work better, but I don’t have high hopes for the cluster staying stable under even small loads in either test.

Test Setup:

I’m using the same test setup as I used in the RBD tests. Two pools, one using 2x replication and the other using erasure coding. Test client is an Arch Linux system running a 5.1.14 kernel with Ceph 13.2.1, a quad core CPU, 16GiB of RAM and connected via a 1GbE connection to the cluster. I’m also running the OSDs with their cache limited to 256MiB maximum size, and the metadata cache limited to 128MiB.

CephFS:

CephFS requires a metadata pool, so I created a replicated pool for metadata. Why not create both at once? CephFS doesn’t currently support multiple filesystems on the same cluster, though there is experimental support for it.

With the pool created and the CephFS filesystem mounted on my test client, I started the dd write test with a 12GiB file using dd if=/dev/zero of=/mnt/test/test.dd bs=4M count=3072. The first run completed almost instantly, apparently completely fitting in the filesystem cache. CephFS doesn’t support iflag=direct in dd, so I simply reran the write test knowing that the cache was pretty full at this point. Almost instantly, with less than 1GiB into the test, two of the nodes simultaneously fell over and died. They were completely unreachable over the network, but this time I connected a HDMI monitor to them to see the console. I saw quickly that kthread had been blocked for over 120 seconds, and the system was pretty completely unusable. A USB keyboard was recognized, but key-presses weren’t registering. I power-cycled the systems after waiting at least five minutes, and they came up fine.

A very unhappy Ceph cluster with two hosts that have locked up.

I tried running the test again, and both hosts quickly locked up. I was running top as they did so, and both hosts rapidly consumed their memory. Despite having the OSD’s cache set to a maximum of 256MiB, the ceph-osd process was using around 750MiB before the system became unresponsive. OOM killer didn’t kill cpeh-osd in this case to save the system, possibly due to the kernel failing to allocate memory internally to do so. CephFS seems to hang the Raspberry Pis hard.

I decided to test a bunch of smaller files, because CephFS is meant more as a general filesystem with a bunch of files, whereas RBD tends to get used to store large VM images. I used sysbench to create 1024 16MiB files for a total of 16GiB on my 40GiB CephFS filesystem. Initially, things seemed to work fine. Sysbench reported that it created the 1024 files at 6.4MB/s. While this was just test preparation, it seemed to be a good sign.

What didn’t seem such a good sign was when I actually started running the sysbench write test and Ceph started complaining about slow MDS ops. A lot of them. The sysbench write test immediately failed, citing an IO error on the file. Running ls -la showed a lot of 0 bytes file, with a couple 16MiB files. Ugh. I recreated the test setup, this time with writes at a blazing 540kB/s. When it finally finished several hours later, attempting to run the write tests showed the same truncation of files to 0B as before. This seemed to be a sysbench issue, but I didn’t spend much time troubleshooting it.

For completeness, I also tried an erasure coded pool with CephFS. Like RBD, the metadata pool isn’t supported on erasure coded pools, and needs to be on a replicated pool. Results initially looked better, but the OSDs still exhausted their memory and caused host freezes, though after a longer time with more data successfully ingested.

RadosGW:

I had intended to test RadosGW with the S3 API, but I decided against it. With two different failed tests, the chances for any test results that didn’t end with the cluster dying are pretty low.

The final cluster hardware setup, with three nodes mounted in a nice stack with shorter (and tidier) network cables. Blue cased Pi isn’t part of the cluster.

Conclusions:

The Raspberry Pi 3 B+ doesn’t have enough RAM to run Ceph. While everything technically works, any attempts at sustained data transfer to or from the cluster fail. It seems like a Raspberry Pi, or other SBC, with 2GB+ of RAM would actually be stable, but still slow. The RAM issue is likely exacerbated by the 900kB/s random write rate the flash keys are capable of, but I don’t have faster flash keys or spare USB hard drives to test with.

Erasure coding seems to be better on RAM limited systems, and while it still failed, it always failed later, with more data successfully written. While it may have been more taxing on the Raspberry Pi’s limited CPU resources, these resources were typically in low contention, with usage averaging around 25% under maximum load across all 4 cores.

The release of the Raspberry Pi 4 B happened while I was writing this series of blog posts. I’d love to re-rerun these tests on three or four of the 4GB models with actual storage drives. The extra RAM should keep the OSDs from running out of memory and dying/taking down the whole system, and the USB3.0 interconnect means that storage and network access will be considerably faster. They might be good enough to run a small, yet stable cluster, and I look forward to testing on them soon.

Raspberry Pi Ceph Cluster – Testing Part 1

It’s time to run some tests on the Raspberry Pi Ceph cluster I built. I’m not sure if it’ll be stable enough to actually test, but I’d like to find out and try to tune things if needed.

Pool Creation:

I want to test both standard replicated pools, and Ceph’s newer erasure coded pools. I configured Ceph’s replicated pools with 2 replicas. The erasure coded pools are more like RAID in that there aren’t N replicas spread across the cluster, but rather data is split into chunks and distributed, checksummed, and then the data and checksums are spread across the pool. I configured the erasure coded pool with data split into 2 with an additional coding chunk. Practically, this means that they should tolerate the same failures, but with 1.5x the overhead instead of 2x the overhead.

I created one 16GiB pool of each type to test. Why not always use erasure coded pools? They’re more computationally complex, which might be bad on compute constrained devices such as the Raspberry Pi. They also don’t support the full set of operations. For example, RBD can’t completely reside on an erasure coded pool. There is a workaround, the metadata resides on a replicated pool with the data on the erasure coded pool.

Baseline Tests:

To get a basic idea for what performance level I could expect from the hardware underlying Ceph, I ran network and storage performance tests.

I tested network throughput with iperf between each Raspberry Pi 3 B+ and an external computer connected over Gigabit Ethernet. Each got around 250Mb/s, which is reasonable for a gigabit chip connected via USB2.0. For comparison, a Raspberry Pi 3 B (not the plus version) with Fast Ethernet tested around 95Mb/s. As a control, I also tested the same iperf client against another computer connected over full gigabit at 950Mb/s.

Disk throughput was tested using dd for sequential reads and writes and iometer for random reads and writes against a flash key with an XFS filesystem. XFS used to be the recommended filesystem for Ceph until Ceph released BlueStore, and it’s still used for BlueStore’s metadata storage partition. The 32GB flash keys performed at 32.7MB/s sequential read and 16.5 MB/s sequential write. Random read and write with 16kiB operations yielded 15MB/s and 0.9MB/s (that’s 900kB/s) respectively using sysbench’s fileio module.

RBD Tests:

RADOS Block Device (RBD) is a block storage service, so you can run your own filesystem but have Ceph’s replication protect the data as well as spread access over multiple drives for increased performance. Ceph currently doesn’t support using pure erasure coded pools for RBD (or CephFS), instead the data is stored in the erasure coded pool and the metedata in a replicated pool. Partial writes also need to be enabled per-pool, as per the docs.

Once the pools were created, I mounted each and started running tests. The first thing I wanted to test was just sequential read and write of data. To do this, I made an XFS filesystem with the defaults and mounted it on a test client (Arch Linux, kernel 5.1.14, quad core, 16GiB RAM, 1xGbE, Ceph 13.2.1) and wrote a file using dd if=/dev/zero of=/mnt/test/test.dd bs=4M count=3072 iflag=direct. Initially, writes to the replicated rbd image looked decent, averaging a whopping 6.4MB/s. And then the VM suddenly got really cranky.

The Ceph manager dashboard’s health status. Not what I’d been hoping for during a performance test.

One host, rpi-node2 had seemingly dropped from the network. Ceph went into recovery mode to keep my precious zeroes intact, and IO basically ground to a halt as the cluster recovered at a blazing 1.3MiB/s. I couldn’t hit the node with SSH, so I power-cycled it. It came back up, Ceph realized that the OSD was back and cut short the re-balancing of the cluster. I decided to run the write test again, deleted the test file and ran fstrim -v /mnt/test, which tells Ceph that the blocks can be freed, so it frees up that space on the OSDs so I could re-run fresh.

The second test ended similarly to the first, with dd writing happily at 6.1MB/s until rpi-node3 stopped responding (including to SSH) at 3.9GB written. This time I stopped dd immediately and waited for the node to come back, which is did after almost two minutes. I checked the logs and saw that the system was running out of memory and the ceph-osd process was getting OOM killed. I also noticed that both nodes that had failed were the ones running the active ceph-mgr instance serving the dashboards.

I ran the test again, this time generating a 1GiB file instead and confirmed that it was the node with the ceph-mgs instance running out of memory. I also let the write finish, testing how well Ceph ran in the degraded state. At 2.8MB/s, no performance records were being set, but the cluster was still ingesting data with one node missing.

I have two options, the first is to move the ceph-mgr daemon to another device, but as I wanted the cluster to be self-contained to three nodes, so I opted for the second option. Option two is to lower the memory usage of the ceph-osd daemon. I looked at the documentation for the BlueStore config, and saw that the default cache size is 1 GiB, or as much RAM as the Pi has. That just won’t do, so I added bluestore_cache_size = 536870912 and bluestore_cache_kv_max = 268435456 to my ceph.conf file and restarted the OSDs. This means that BlueStore will use at most 512MiB of RAM for its caches with only 256MiB maximum for the RocksDB metadata cache.

I reran the 1GiB file test and had 3.5M B/s write speed and no OSDs getting killed. With the 16GiB file, writes averaged 3MB/s, but RAM usage at the halfway mark eventually got the OSD running on the same node as the active ceph-mgr killed. Again, the cluster survived, just in a less happy state until the killed OSD daemon restarted. I disabled the dashboard, and while this helped RAM usage, the ceph-osd daemon was still getting killed. I further dropped the BlueStore cache to 256MiB and 128MiB for the metadata store. This time I locked two of the three nodes up hard and needed to power-cycle them.

During the 1GiB file tests, this was the unhappiest the cluster got. The PGs were simply being slow replicating and caught up quickly once the write load subsided.

With the replicated testing being close to a complete failure, I moved on to testing the erasure coded pool. I expected them to be worse due the the increased amount of compute resources needed. I was wrong. I was able to successfully write the test file, and the worst I got was an OSD being slow and not responding to the the monitor fast enough and then recovering a few seconds later. Sequential writes averaged 5.7MB/s and sequential read was an average of 6.1MB/s, but I still had two nodes go down at different times. It seems that erasure coded pools perform slightly better, but can still cause system memory exhaustion.

One thing to note is that even with three nodes, Ceph never lost any data and was still accepting write, just very slowly. I hadn’t intended to test Ceph’s resiliency, as that has already been well tested, but it was nice to see that it kept serving reads and writes.

At this point, I don’t think that the 1GiB of RAM is enough to run RBD properly. Sequential writes looked to be around 6MB/s when the cluster hadn’t lost any OSDs or whole hosts. I never attempted to test random access, due to the issues with sequential reads and writes.

CephFS and Rados Gateway:

With RBD being a bit of a bust, I wanted to see if CephFS and Rados Gateway performed better. As this post is getting long, CephFS and RadosGW results are in a second post, along with a conclusion.

Featured

Raspberry Pi Ceph Cluster – Setup

I’ve used Ceph several times in projects where I needed object storage or high resiliency. Other than using VMs, it’s never something I can easily run on at home to test, especially not on bare-metal. Circa 2013, I tried to get it running on a handful of Raspberry Pi Bs that I had, but they had far too little RAM to compile Ceph directly on, and I couldn’t get Ceph compiling reliably on a cross-compiler toolchain. So I moved on with life. Recently, I’ve been looking at Ceph as a more resilient (and expensive) replacement for my huge file server. I found that it’s now packaged on Arch Linux ARM’s repos. Perfect!

Having that hurdle out of the way, I decided to get a 3 node Ceph cluster running on some Raspberry Pis.

The Hardware:

Ceph Cluster Gear
All the gear needed to make a teeny Ceph cluster.

Hardware used in the cluster:

  • 3x Raspberry Pi 3 B+
  • 3x 32GB Sandisk Ultra microSD card (for operating system)
  • 3x 32GB Kingston DataTraveler USB key (for OSD)
  • 3x 2.5A microUSB power supplies
  • 3x Network cable
  • 1x Netgear GS108E Gigabit Switch and power supply

My plan is to have each node run all of the services needed for the cluster. This isn’t a recommended configuration, but I only have three Raspberry Pis to dedicate to this project. Additionally, I’m using a fourth Raspberry Pi as an automation/admin device, but it isn’t directly participating in the cluster.

Putting it all Together:

  • Create a master image to save to the three SD cards. I grabbed the latest image from the Arch Linux ARM project and followed their installation directions to get the first card set up.
  • Note: I used the Raspberry Pi 2 build, because the Ceph package for aarch64 on Arch Linux ARM is broken. I’m also using 13.2.1, because Arch’s version of Ceph is uncharacteristically almost a year old and I wasn’t going to try to compile 14.2.1 myself again.
  • Once the basic operating system image was installed, I put the card into the first Raspberry Pi and verified that SSH and networking came up as expected. It did, getting a static IP address from my already configured DHCP server.
  • I first tried to use Chef to configure the nodes once I got them running. Luckily Arch has a PKGBUILD for chef-client in the AUR. Unluckily, it’s not for aarch64. Luckily again, Chef is supported on aarch64 under Red Hat. I grabbed the PKGBUILD for x86_64, modified it to work with aarch64, built and installed the package.
  • I created a chef user, gave it sudo access, generated an SSH key on my chef-server, and copied it to the node.
  • At this point, I had done as much setup on the single node I had, so I copied the image onto the other two microSD cards, and put them into the other Pis.
  • Chef expects to be running on a system that isn’t Arch Linux. After some time trying to get it working, I decided that I’d spent enough time trying to get it working.
  • With Chef a bust, I moved on to Ansible and re-imaged the cards to start fresh.
  • ceph-ansible initially worked better, due to Ansible being supported on Arch Linux, but the playbook doesn’t actually support Arch. I needed to make some modifications to get the playbook to run.
  • With some basic configuration of the playbook ready to go, I got the mons up and running pretty easily. But osd creation failed on permissions issues. Something in the playbook was expecting a different configuration that Arch Linux uses. Adding the ceph user to the “disk” and “storage” groups partially fixed the permissions issues, but osd creation was still failing. Ugh.
Ceph cluster in operation. The rightmost blue Raspberry Pi is being used as an admin/automation server and isn’t part of the actual Ceph cluster.

Time for Manual Installation:

While part of my goal was to try some of the Ceph automation, chef’s and ceph-ansible’s lack of support for Arch Linux meant that I wasn’t really accomplishing my main goal, which was to get a small Raspberry Pi cluster up and running.

So I re-imaged the cards, used an Ansible bootstrap playbook that I wrote and referred to Ceph’s great manual deployment documentation. Why manually deploy it when ceph-deploy exists? Because ceph-deploy doesn’t support Arch Linux.

MONs:

MONs or monitors stores the cluster map, which is used to determine where to store data to meet the reliability requirements. They also do general health monitoring of the cluster.

After following all of the steps in the Monitor Bootstrapping section, I had the first part of a working cluster, three monitors. Yay!

One difference from the official docs, in order to get the mons starting on boot, I needed to run systemctl enable ceph-mon.target in addition to the ceph-mon daemons, otherwise systemctl listed their status as Active: inactive (dead) and they didn’t start.

MGRs:

The next step was to get ceph-mgr running on the nodes with mons on them. The managers are used to provide a interfaces for external monitoring, as well as a web dashboard. While Ceph has documentation for this, I found Red Hat’s documentation more straight forward.

In order to enable the dashboard plugin, two things needed to be done on each node:

  • First, run ceph dashboard create-self-signed-cert to generate the self-signed SSL certificate used to secure the connection.
  • Then run ceph dashboard set-login-credentials username password, with the username and password credentials to create for the dashboard.
  • Running ceph mgr services then returned {"dashboard": "https://rpi-node1:8443/"}, confirming that things had worked correctly, and I could get the dashboard in my browser.

OSDs:

Now for the part of the cluster that will actually store my data, the OSD, which use the 32GB flash keys. If I wanted to add more flash keys, or maybe even hard drives, I could easily add more OSDs.

I followed the docs, adding one bluestore OSD per host on the flash key. One note, as I’d already tried to set the keys up using ceph-ansible, they did have GPT partition tables. I ran ceph-volume lvm zap /dev/sda on each host to fix this.

Additionally, I didn’t realize that the single ceph-volume command also sets up the systemd service, and I created my own. Now I had an OSD daemon without any storage in the cluster map. I followed Ceph’s documentation and removed the OSD, but now my OSDs IDs start at 1 instead of 0.

MDSs:

I plan on testing with CephFS, so the final step is to add the MetaData Server, or MDS, which stores metadata related to the filesystem. The ceph-mds daemon was enabled with systemctl enable ceph-mds@rpi-nodeN (N being the number of that node) and then systemctl enable ceph-mds.target so that the MDS is actually started.

Now What?

A healthy Raspberry Pi Ceph cluster, ready to go.

The cluster can’t store anything yet, because there aren’t any pools, radosgw hasn’t been set up and there aren’t any CephFS filesystems created.

If I had more Pis, I’d love to test thing with more complex CRUSH maps.

The next blog post will deal with testing of the cluster, which will likely include performance tuning. Raspberry Pis are very slow and resource constrained compared to the Xeon servers I’ve previously run Ceph on, so I expect things to go poorly with the default settings.

Results:

I split the results into two parts, the first includes test setup and RBD tests and the second will includes CephFS and radosgw tests as well as a conclusion.