ComputeBlade - Kyle's Second Brain

%% Title: Created: 2023-04-04 19:15 Status: Parent: [[Projects/Computing]] Tags: Source: %% # ComputeBlade - [Source](https://git.wntrmute.dev/kyle/bladerunner) ([Github mirror](https://github.com/kisom/bladerunner)) - [Docs](https://bladerunner-docs.wntrmute.dev/) I’ve got a 10-[[Resources/Hardware/ComputeBlade|blade]] enclosure coming (at some unspecified date in the future) with a capacity of - 40 Cortex-A72 cores @ 1.5 GHz (in the form of [[Resources/Computing/Ambient/Platform/RaspberryPi|Raspberry Pi CM4s]]) - 80 GB of RAM - 6 TB of storage - 6x Coral TPU This was the genesis of the project, but it’s expanded to include [[Resources/Computing/Ambient/Platform/RaspberryPi|RPi 4]]s as well. There’s a [Git repo](https://git.wntrmute.dev/kyle/bladerunner) for the infrastructure (with a [Github mirror](https://github.com/kisom/bladerunner)). The docs are [hosted on netlify](https://bladerunner-docs.wntrmute.dev/) and are built with each commit. ### Ideas Secure compute infrastructure - Managing identities for a number of a number of machines. - Build infrastructure for AI/ML projects. - Miscellaneous services - K8S as a learning exercise - AI/ML training ### BOM | Item | Quantity | Source | Notes | | ------------------------------- | -------- | ------ | ---------------------- | | DEV Compute Blade (batch 02) | 5 | KS | TPM 2.0, µSD, nRPIBOOT | | Custom Heat Sink (DEV) | 4 | KS | | | Custom Heat Sink (DEV) | 1 | KS | weird | | TPM Compute Blade (batch 01) | 5 | KS | TPM 2.0 | | Custom Heat Sink (TPM) | 5 | KS | | | AI Module | 3 | KS | M.2 addon board | | Raspberry Pi CM4 (8GB RAM) | 10 | KS | no eMMC/wifi/bt | | Latch | 10 | KS | | | Real Time Clock (RTC) Module | 10 | KS | | | 10" BladeRunner (1U rack) | 1 | KS | Fits 10 blades, 3D print | | Dumb Fan Unit | 4 | KS | Fits 2 blades | | Smart Fan Unit | 2 | KS | Fits 2 blades | | SAMSUNG 970 EVO Plus 500GB | 4 / 7 | AMZ | 2280 | | SAMSUNG 970 EVO Plus 1TB | 2 / 3 | AMZ | 2280 | | Netgear GS316PP | 1 | AMZ | 16-port PoE+ (183W) | | Cat 6 RJ45, 1 metre | 10 | AMZ | | I still need to figure out what I’m doing about the management interface - I’m thinking a CM4 board with a pair of ethernet ports. ### Power budget - Constraint: 183W total available power, with up to 30W per port - 11.4W per port - 18.3W per blade given the 10-blade configuration - Various reports indicate ~7W under load, we’ll say 10W - Each coral TPU is up to 2W, so +4W / AI module - 7x baseline blades @ 10W - 3x AI equipped blades @ 14W - 112W total power under load ### AI modules I ended up grabbing three AI modules, not four as originally planned just due to price constraints. The RAID module isn’t available yet. ~~Once they go live, I’d like to add four of the AI boards for model training and building an AI workload cluster, and potentially two RAID1 boards each with 1TB of storage.~~ ### PoE switches - [Netgear GS316PP 16-port PoE+ 183W](https://www.amazon.com/dp/B0824HNVRY/) ### Things to follow up on - [Boot a Raspberry Pi 4 using u-boot and Initramfs](https://hechao.li/2021/12/20/Boot-Raspberry-Pi-4-Using-uboot-and-Initramfs/) - [How to unlock a LUKS volume on boot on Raspberry Pi OS](https://linuxconfig.org/how-to-unlock-a-luks-volume-on-boot-on-raspberry-pi-os) - [How to PXE boot a Raspberry](https://www.howtoraspberry.com/2022/03/how-to-pxe-boot-a-raspberry/)