Hello, On Mon, Nov 19, 2007 at 12:58:11AM -0500, Nic Watson wrote: > > > Andrea Arcangeli wrote: > > On Sun, Nov 11, 2007 at 11:20:13PM -0500, Nic Watson wrote: > >> OK. I think I need a seccomp primer. Any pointers? I'm puzzled by the > > > > a very short doc can be found here http://www.win.tue.nl/~aeb/linux/lk/lk-14.html > > Thanks. That's what I needed. You're welcome. I really appreciate the high quality feedback I'm receiving from you, Ed, Alex, Gregory over time and lots of other really helpful people. > > The two main problems in an approach ala EC2 is that the CPUShare > > virtual machines must run into a VPN and they can't be exposed to the > > live internet, and this has to be enforced at the virtual machine > > level. > > This is doable. I thought more about this and openvpn probably can as well run in the userland of the virtual machine itself requiring only a tiny modification to "-net user" (default) model to limit the destination IP of the tcp/udp connections created by KVM to the buyer IP. The only other complication from a KVM standpoint is that the /dev/kvm device will need to be accessible by the cpushare gid. OTOH an interesting KVM feature is the "-net socket" one, that practically already creates a virtual ethernet to connect different KVM guests, using TCP sockets. AFAICS the downsides of the already existing "-net socket" model compared to the "-net user + my ip destination firewall" model are: 1) the "-net socket" model requires KVM to run in the buyer computer too 2) tcp over tcp sounds a big problem in the internet (openvpn would run over udp to avoid congestion control over congestion control troubles) Or am I missing something about "-net socket"? 3) communications wouldn't be encrypted, clearly the seller can see the contents of /dev/mem anyway but only that seller can see them if encryption is used in the network (the ISP of the buyer will see nothing at all w/ encryption vs everything w/o encryption) Despite being apparently a perfect fit, I'm not satisfied by the "-net socket" KVM model. It's all I need from a strict security standpoint, but far from the best from a performance and privacy standpoint. And reinventing OpenVPN inside qemu doesn't sound a good idea. > > The second problem is that this storage will go away at any > > time if the client disconnects. The only reliable possible storage is > > on the buyer hardware, providing persistent storage in the seller > > hardware is not feasible or at least not easily and even if feasible > > at all with fault tolerance it still wouldn't be 100% reliable. > > I agree. However, if you had the VPN, could you not just use an NFS > client (or something similar) to the buyer for persistent storage? The Yes, with openvpn running in the seller-guest environment, NFS can be mounted from the server just fine, both over TCP or UDP. Sellers can run listening sockets too and talk to each other directly (it won't be noticeable that the traffic between two sellers is passing through the buyer ip). > trickier problem to resolve is how to transfer the initial image. This > implies a requirement for a persistent reliable cache. The persistent reliable cache of the initial image isn't a problem. Seccomp bytecode is already cached too. The size of the image is the only issue, it'll be much much bigger than seccomp... The buyer will have to upload them all to every new client. All transfers will be securely encrypted by the cpushare protocol though (guaranteed by the secret ssl key in the cpushare server, even if they don't pass through the server!). I tend to think that running it with -snapshot would be wrong because it would create unsafe exploitable memory overhead, -snapshot requires a copy-on-write, but the destination of the copy is RAM and not disk. So I think it's better to copy the image to a temporary file, start KVM and delete the file of the image. > UML can run under seccomp? How? Not right now of course... > >> It is tempting to think of adding callbacks to the sell client python > >> script to perform certain operations you can't in the seccomp jail, like > >> mounting a RAM disk. However, if you add more than a handful of > >> callbacks, you're essentially implementing your own OS again. > > > > If the interprocess communication isn't too heavy, you very well can > > write your own protocol to communicate with the buyer and have your > > per-seller directory where open/read/write/close() run by the seccomp > > binary in the seller cpu, will execute open/read/write/close in the OS > > of the buyer. No server or client changes are required for this. > > This would be useful. This would be the secure way to implement a networked fs in userland on top of CPUShare yes. However the fact no single buyer showed up contacting me for a real life app yet and not even malloc is working yet, is strongly convincing me that seccomp will never allow CPUShare to get out of the chicken egg loop even if I still think it is the most secure model and not too complicated in practice. > I'm aware of the security issues of virtualization (that's a great link, > though). It would be an interesting analysis, comparing seccomp with, > say, QEMU without a kernel module, in terms of process isolation. Dyngen should be fairly secure but I think the most secure of all is KVM thanks to eliminating the whole dyngen complexity out of qemu and mostly depending on hardware features. But the major point against dyngen is that despite being totally cute and smart, from a business standpoint dyngen is useless because too slow for anything real. kqemu would be ok but it still uses dyngen for kernel and I don't rather limit the number of ways CPUShare can be exploited. I'd be sad to hit bugtraq because of a dyngen related kqemu issue. If I have to hit bugtraq because of a KVM guest->host exploit that's now going to be unavoidable because KVM is by far the best, but I want to at least avoid the avoidable (i.e. dyngen risks). > Yeah, that's what EC2 does, except for isolating the VMs from the > internet (except for NAT). If EC2 in theory could allow you to run a three-way-handshake syn-ack-ack flood too to take any server in the internet down, I guess Amazon has enough cash to handle a following legal action from the unlucky destination IP of the attack. With CPUShare this can't be allowed, I can't expose the sellers nor myself to any legal issue (clearly those would be minor issues because the buyer IP+port! would be logged in the cpushare servers like it clearly has to happen with Amazon but it'd still way too much of a problem for any seller). Trying to avoid any legal risk for the sellers and myself is the top priority of the project. So I must enforce the destination IP of all KVM generated packets to be the buyer IP only. So the buyer can only take itself out of the internet, not anybody else. But then openvpn can connect all the other seller nodes in the same network so they will talk to each other and they'll practically get out of their NATs and move inside the buyer corporate firewall. Openvpn can run natively in the host of the buyer (no virtualization would be needed in the buyer hardware with this model). With openvpn the buyer could even connect with rdesktop to a windows box running in the seller guest if he manages go figure out how to start openvpn at boot in windows... this is just an example of course. > I was first thinking, why not just emulate the EC2 model. I see, > though, that the threat profile is different. Amazon has their compute > cluster isolated from their corporate network. A compromised node in > EC2 might screw with other VMs, but that's about it. > > A CPUShare seller likely does have that isolation. That's a big difference. > > Thanks for the discussion, and good luck with CPUShare. Thanks ;). Your post was the icing on the cake in convincing me to switch. KVM or Xen was planned for a long time but I considered it low priority until now. Just a few weeks ago I was still thinking to improve the seccomp protocol instead of adding KVM... I don't care much about the new design as long as it will prevent any software modification at all, and I don't just mean of the applications, I mean of the OS too. I'm so sick of people not able to use CPUShare because of software modifications required to their existing applications, that this time I will go an extra mile and I will allow windows client/server software to run fine and totally unmodified on CPUShare (the CPUShare buy client will likely still run under a linux guest in VirtualPC but then both the KVM guests in the sellers and the host of the buyer can perfectly fine run windows in the new CPUShare design). So even rendering software purely dedicated to windows will run if you're ok to upload hundred megs for every new client connected to upload a windows and you're windows licenses allow it, and if the protocol is fault tolerant about nodes going down at any time. Of course I doubt anybody will ever run windows in the guests, more likely they will run *bsd, but then this is the model I want now because I want to allow _anything_ to run on CPUShare ASAP. I'll only support x86-64. A single x86-64 root image will be required and KVM currently only runs on x86-64 anyway. DNS may be still allowed to route through any destination IP so with dyndns in the openvpn configuration the buyer IP can be dynamic and not require any image change. The KVM model will largely be less secure than seccomp (think that even the parsing of the qcow2 disk image to boot in the virtual machine could have buffer overflow to exploit...). However the current maximum secure CPUShare with no buyers isn't useful _at_all_ so far. The CPU performance will be 100% native with KVM (KVM virtualization performance hit is visible in kernel-guest loads when no paravirtualization is used, but those aren't kernel-guest loads). Supporting seccomp and KVM at the same time as planned initially, would be possible but it would be a mess, all performance mips/mflops for seccomp couldn't be applied reliably as KVM performance. So my current roadmap is to finish some detail in the website, then make a tag in the HG repository and then rewrite the cpushare protocol to only depend on KVM. Large parts will be shared. The major limitation is that VirtualPC won't allow KVM to run. So the livecd will only work if booted on the real hardware, but then the good thing is that I'll be easy to stack trusted computing on top of KVM using the livecd because the livecd will only work on top of the hardware. Practically speaking only tracking hashes of livecd is going to be feasible, hashes of Linux Kernels or Xen hypervisors would not work at all in an open source world. This way I will implement a switch in the buy order like "buy only trusted nodes", and I won't require the buyer itself to keep track of the livecd hashes. Furthermore I'll be able to nuke insecure attestation-exploitable livecd out of the trusted hashlist if such an an exploit would ever materialize. So the new CPUShare design will be purely KVM x86-64 dependent, the new livecd including KVM will be bigger and it will only work if booted on the real hardware, but the new livecd will provide the buyer with the trusted computing guarantee. In short the only downside compared to a real cluster will be that the seller nodes will go down at any time requiring fault tolerance in the application managing the cluster and running in the buyer hardware, and second that there isn't any local storage. Then we'll see what happens... I'll can always go back to seccomp if nobody uses the KVM model. If in the meantime somebody places a large order with seccomp I can still change my mind and not nuke seccomp out of CPUShare ;). The current seccomp model will keep working for quite some time still, it will take some time for this change to materialize, so if somebody feels they need this better than the KVM model, let me know. Andrea -- cpushare-discuss_X_at_X_cpushare.com mailing list - http://www.cpushare.com/ To unsubscribe, send mail to cpushare-discuss-unsubscribe_X_at_X_cpushare.comReceived on 2007-11-23 04:51:25
Click here to return to to homepage.
Disclaimer: the messages posted here are under the sole responsibility of the poster: cpushare.com is publishing mailing list messages in real time while storing safely all the logs containing the relevant IP addresses, timings and mail hops. If you find anything not appropriate in these messages please send a notification through this form. Thank You.
CPUShare Discuss has been converted to html using hypermail 2.2.0.