• Septimaeus@infosec.pub
    link
    fedilink
    English
    arrow-up
    1
    ·
    7 hours ago

    I drafted several local-only designs for testing these models and that estimate looks about right (for sharding only, where most ancillary strategies come with some pretty big tradeoffs).

    It turns out the actual answer in practice is specialized private compute rental, meaning the actual hardware is generally shared. That made a lot more sense to me once I had an intuitive grasp of the fact that every second of idle time on a platform capable of running these bigger models well remains an intolerable expense, simply because they are built for specialized enterprise data center infrastructure.

    Put another way, to host locally you would end up needing to rent the idle time in order to make the infrastructural investment make sense financially, at which point you’re running your own data center and arrive at essentially the same destination.