• 0 Posts
  • 40 Comments
Joined 2 years ago
cake
Cake day: June 7th, 2023

help-circle
  • Just a couple thoughts (I have a mix of 2.5Gb and 10Gb):

    • Mikrotik switches are a nice alternative to Unifi. Much less lipstick on the UI but reliable and fairly priced.

    • If possible, you’ll probably want to use your own router rather than the all-in-one provided by the ISP. In my case, the router provided to me (Eero brand) did not even have a port fast enough for my service, and would have been an instant bottleneck.

    • Options for 10Gb-capable PCIe adapters (what you might put in your server or desktops) are more limited (at least they were when I transitioned a couple of years ago). Intel-based network adapters seem to require less effort to get working (driver-wise) vs. some of the other 10Gb / SFP+ capable adapters.

    Finally, you are correct: nobody needs an 8Gb internet connection. Aside from well-seeded torrent file transfers, you will never reach that limit (and probably still never). And, you’ll need an adequate storage backend to write that fast.















  • This would ideally become standardized among web servers with an option to easily block various automated aggregators.

    Regardless, all of us combined are a grain of rice compared to the real meat and potatoes AI trains on - social media, public image storage, copyrighted media, etc. All those sites with extensive privacy policies who are signing contracts to permit their content for training.

    Without laws (and I’m not sure I support anything in this regard yet), I do not see AI progress slowing. Clearly inbreeding AI models has a similar effect as in nature. Fortunately there is enough original digital content out there that this does not need to happen.





  • I want Ars content to be part of whatever training data is provided to the best models. How does that get done without appearing like they are being bought?

    Even if their contract explicitly states that it is a data sharing agreement only and the products of the media organization (articles/investigations) are not grounds for breach or retaliation, it is assumed that there is now some impartiality in future reporting.

    So, for all media companies, the options seem to be:

    1. Contribute to the greater good by openly permitting site scraping (for $0)
    2. Allow data sharing to contracted parties only (for a fee)
    3. Public or privately prohibit use of any data, and then seek damages down the road for theft/copyright infringement when the legal framework has been established.

    Is there a GPL or other license structure that permits data sharing for LLM training in a way that it does not get transformed into something evil?


  • I pay for Nebula and try to watch as much as I can there. The content is more “pleasant department store” and less “Mexican public market”.

    I do watch YouTube regularly when channel-surfing, but if I ever see an ad (which happens only on mobile devices), I close it immediately and do something else. It’s not that I don’t think I should be able to watch everything for $0, but YouTube ads are so jarring, random, irrelevant and just make me sick. They literally ruin whatever I was watching and make me sad to exist.

    It can be exhausting to wade through the absolute meat market of click bait titles and thumbnails to find something that not only looks interesting but won’t abuse me with infomercial-form audio/visuals.

    YouTube enables and promotes the “content creators” who abuse human psychology to accumulate views, likes, subscriptions, etc. The best thing that could happen is they continue to be exposed as the drug dealer they are.