I would love the child of a Surfacebook with a Framework laptop; or A bare keyboard attached to a screen, that I could plug my phone (possibly running Phosh) and use it as a hardware for a laptop experience

  • j4k3@lemmy.world
    link
    fedilink
    English
    arrow-up
    15
    arrow-down
    1
    ·
    edit-2
    1 year ago
    • Open source motherboards
    • Open source modems for computers and phones
    • Open source cars
    • GrapheneOS phone with enough RAM to run a decent offline LLM
    • Offline AI privacy/network manager designed to white noise the stalkerware standards of the shitternet with a one click setup
    • Real AI hardware designed for tensor math using standard DIMM system memory with many slots and busses in parallel instead of bleeding edge monolithic GPU stuff targeting a broad market. The bottle neck in the CPU structure is the L2 to L1 cache bus width and transfer rate with massive tensor tables that all need to run at one time. System memory is great for its size but its size is only possible because of the memory controller that swaps out a relatively small chunk that is actually visible to the CPU. This is opposed to a GPU where there is no memory controller and the memory size is directly tied to the compute hardware. This is the key difference we need to replicate. We need is a bunch of small system memory sticks where the chunk normally visible to the CPU is all that is used and a bunch of these sticks on their own busses running to the compute hardware. Then older, super cheap system memory could be paired with ultra cheap trailing edge compute hardware to make cheaper AI that could run larger models, (at the cost of more power consumption). Currently larger than 24GBV GPUs are pretty much unobtainium, like an A6000 at 48GBV will set you back at least $4k. I want to run a 70B or greater. That would need ~140GBV to run super fast on dedicated optimised hardware. There is already an open source offline 180B model, and that would need ~360GBV for near instantaneous response. While super speeds with these large models is not needed for basic LLM prompting, it makes a big difference with agents where the model needs to do a bunch of stuff seamlessly while still appearing to work in realtime conversationally.