Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)HE
Posts
3
Comments
939
Joined
2 yr. ago

shittysuperpowers @lemm.ee

You can make people misinterpret homophones

Fuck AI @lemmy.world

Meta trained its AI on almost all public posts since 2007

Gaming @lemmy.ml

Video - Palworld Modded with Pokemon

  • To be clear, I agree that the line you quoted is almost assuredly incorrect. If they changed it to "thousands of deepfake apps powered by open source technology" then I'd still be dubious, simply because it seems weird that there would be thousands of unique apps that all do the same thing, but that would at least be plausible. Most likely they misread something like https://techxplore.com/news/2025-05-downloadable-deepfake-image-generators.html and thought "model variant" (which in this context, explicitly generally means LoRA) and just jumped too hard on the "everything is an open source app" bandwagon.

    I did some research - browsing https://github.com/topics/deepfakes (which has 153 total repos listed, many of which are focused on deepfake detection), searching DDG, clicking through to related apps from Github repos, etc..

    In terms of actual open source deepfake apps, let's assume that "app" means, at minimum, a piece of software you can run locally, assuming you have access to arbitrary consumer-targeted hardware - generally at least an Nvidia desktop GPU - and including it regardless of whether you have to write custom code to use it (so long as the code is included), use the CLI, hit an API, use a GUI app, a web browser, or a phone app. Considering only apps that have as a primary use case, the capability to create deepfakes by face swapping videos, there are nonetheless several:

    • Roop
    • Roop Unleashed
    • Rope
    • Rope Live
    • VisoMaster
    • DeepFaceLab
    • DeepFaceLive
    • Reactor UI
    • inswapper
    • REFace
    • Refacer
    • Faceswap
    • deepfakes_faceswap
    • SimSwap

    If you included forks of all those repos, then you'd definitely get into the thousands.

    If you count video generation applications that can imitate people using, at minimum, Img2Img and 1 Lora OR 2 Loras, then these would be included as well:

    • Wan2GP
    • HunyuanVideoGP
    • FramePack Studio
    • FramePack eichi

    And if you count the tools that integrate those, then these probably all count:

    • ComfyUI
    • Invoke AI
    • SwarmUI
    • SDNext
    • Automatic1111 SD WebUI
    • Fooocus
    • SD WebUI Forge
    • MetaStable
    • EasyDiffusion
    • StabilityMatrix
    • MochiDiffusion

    If the potential criminals use easier ready-made (commercial) web-services instead of buying a RTX 5090, learning ComfyUI, dealing with the steep learning curve etc, we’d know we have to primarily fight those apps and services, not necessarily the generative AI tools.

    This is the part where, to be able to answer that, someone would need to go and actually test out the deepfake apps and compare their outputs. I know that they get used for deepfakes because I've seen the outputs, but as far as I know, every single major platform - e.g., Kling, Veo, Runway, Sora - has safeguards in place to prevent nudity and sexual content. I'd be very surprised if they were being used en masse for this.

    In terms of the SaaS apps used by people seeking to create nonconsensual, sexually explicit deepfakes... my guess is those are actually not really part of the figure that's being referenced in this article. It really seems like they're talking about doing video gen with LoRAs rather than doing face swaps.

  • Without searching for them myself to confirm, it’s plausible, especially if you take it to mean “apps leveraging open source AI technology.”

    There are a ton of open source AI repos, many of which provide video related capabilities. The number of true open source AI models is very slim, but “Open weight” AI models are commonly referred to as open source, and from the perspective of building your app, fine tuning the model, or creating Loras for it, open weight is good enough.

    Some Loras come with details on the training data set, so even if the base model is only open weights, the Lora can still be open source.

    Until recently, Civitai had Loras for famous people, e.g., Emma Watson, and apparently just regular people. There was a post here last week, I think (or maybe to some other community), to 404 Media, about those being taken down thanks to credit card processors drawing a line in the sand at deepfake imagery.

    ComfyUI is a self hostable AI platform (and there are also many hosts that offer it) that lets you build a workflow from multiple nodes, each of which generally integrates some open source AI tech that was otherwise released. For example, there are nodes that add the capabilities to perform:

    • image generation with Stable Diffusion, Flux, Hidream, etc
    • TTS with KokoroTTS, Piper, F5 TTS, etc
    • video generation with AnimateDiff, Cog, Wan2.1, Hunyuan, FramePack, FantasyTalking, Float
    • video modification, i.e., LatentSync, which takes a video and lipsyncs it to a provided audio file
    • image manipulation, i.e., controlnet, img2img, inpainting, outpainting, or even specific tasks like “remove the background” or “change the face to this other face”

    If you think of a deepfake as just a video of a recognizable person doing a thing, you can create a deepfake by:

    • taking an existing video and swapping the face in each frame
    • faceswap video specific approaches, i.e., Roop.
    • an image to video workflow, i.e., with Wan: “the person dances.” You can expand the options available with Wan by using Loras.
    • a text to video workflow, where you use a Lora for that person
    • an image+audio to video workflow, i.e., with FantasyTalking/Float, creating a lipsync to an audio file you provide
    • a video+audio to video workflow with LatentSync to make it look like they said something different, particularly using a TTS (like F5 TTS) that does voice cloning to generate the new audio

    My suspicion is that most of the AI apps that are available online are just repackaging these open source technologies, but are not open source themselves. There are certainly some, of course, though the ones I know of are more generic and not deepfake specific (ComfyUI, SwarmUI, Invoke AI, Automatic1111, Forge, Fooocus, n8n, FramePack Studio, FramePack Eichi, Wan2GP, etc.).

    This isn’t a licensing issue, as many open source projects are licensed with MIT or Apache licenses, which don’t require you to open source derivative products. Even if they used the GPL, it wouldn’t be required for a SaaS web app. Only the AGPL would protect against that, and even then, only the changes to the AGPL library would need to be shared; the front end app could still be proprietary.

    The other issue could be them not knowing what “app” means. If you think of a Lora as an app, then the sentence might be accurate. I don’t know for sure that there were thousands of Loras for people that published their training data, but I wouldn’t be surprised if that were the case.

  • I think the best way to handle this would be to just encode everything and upload all files. If I wanted some amount of history, I'd use some file system with automatic snapshots, like ZFS.

    If I wanted to do what you've outlined, I would probably use rclone with filtering for the extension types or something along those lines.

    If I wanted to do this with Git specifically, though, this is what I would try first:

    First, add lossless extensions (*.flac, *.wav) to my repo's .gitignore

    Second, schedule a job on my local machine that:

    1. Watches for changes to the local file system (e.g., with inotifywait or fswatch)
    2. For any new lossless files, if there isn't already an accompanying lossy files (i.e., identified by being collocated, having the exact same filename, sans extension, with an accepted extension, e.g., .mp3, .ogg - possibly also with a confirmation that the codec is up to my standards with a call to ffprobe, avprobe, mediainfo, exiftool, or something similar), it encodes the file to your preferred lossy format.
    3. Use git status --porcelain to if there have been any changes.
    4. If so, run git add --all && git commit --message "Automatic commit" && git push
    5. Optionally, automatically craft a better commit message by checking which files have been changed, generating text like Added album: "Satin Panthers - EP" by Hudson Mohawke or Removed album: "Brat" by Charli XCX; Added album "Brat and it's the same but there's three more songs so it's not" by Charli XCX

    Third, schedule a job on my remote machine server that runs git pull at regular intervals.

    One issue with this approach is that if you delete a file (as opposed to moving it), the space is not recovered on your local or your server. If space on your server is a concern, you could work around that by running something like the answer here (adjusting the depth to an appropriate amount for your use case):

     bash
        
    git fetch --depth=1
    git reflog expire --expire-unreachable=now --all
    git gc --aggressive --prune=all
    
      

    Another potential issue is that what I described above involves having an intermediary git to push to and pull from, e.g., running on a hosted Git forge, like GitHub, Codeberg, etc.. This could result in getting copyright complaints or something along those lines, though.

    Alternatively, you could use your server as the git server (or check out forgejo if you want a Git forge as well), but then you can't use the above trick to prune file history and save space from deleted files (on the server, at least - you could on your local, I think). If you then check out your working copy in a way such that Git can use hard links, you should at least be able to avoid needing to store two copies on your server.

    The other thing to check out, if you take this approach, is git lfs. EDIT: Actually, I take that back - you probably don't want to use Git LFS.

  • It was already known before the whistleblower that:

    1. Siri inputs (all STT at that time, really) were processed off device
    2. Siri had false activations

    The “sinister” thing that we learned was that Apple was reviewing those activations to see if they were false, with the stated intent (as confirmed by the whistleblower) of using them to reduce false activations.

    There are also black box methods to verify that data isn’t being sent and that particular hardware (like the microphone) isn’t being used, and there are people who look for vulnerabilities as a hobby. If the microphones on the most/second most popular phone brand (iPhone, Samsung) were secretly recording all the time, evidence of that would be easy to find and would be a huge scoop - why haven’t we heard about it yet?

    Snowden and Wikileaks dumped a huge amount of info about governments spying, but nothing in there involved always on microphones in our cell phones.

    To be fair, an individual phone is a single compromise away from actually listening to you, so it still makes sense to avoid having sensitive conversations within earshot of a wirelessly connected microphone. But generally that’s not the concern most people should have.

    Advertising tracking is much more sinister and complicated and harder to wrap your head around than “my phone is listening to me” and as a result makes for a much less glamorous story, but there are dozens, if not hundreds or thousands, of stories out there about how invasive advertising companies’ methods are, about how they know too much, etc.. Think about what LLMs do with text. The level of prediction that they can do. That’s what ML algorithms can do with your behavior.

    If you’re misattributing what advertisers know about you to the phone listening and reporting back, then you’re not paying attention to what they’re actually doing.

    So yes - be vigilant. Just be vigilant about the right thing.

  • proven by a whistleblower from apple

    Assuming you have an iPhone. And even then, the whistleblower you’re referencing was part of a team who reviewed utterances by users with the “Hey Siri” wake word feature enabled. If you had Siri disabled entirely or had the wake word feature disabled, you weren’t impacted at all.

    This may have been limited to impacting only users who also had some option like “Improve Siri and Dictation” enabled, but it’s not clear. Today, the Privacy Policy explicitly says that Apple can have employees review your interactions with Siri and Dictation (my understanding is the reason for the settlement is that they were not explicit that human review was occurring). I strongly recommend disabling that setting, particularly if you have a wake word enabled.

    If you have wake words enabled on your phone or device, your phone has to listen to be able to react to them. At that point, of course the phone is listening. Whether it’s sending the info back somewhere is a different story, and there isn’t any evidence that I’m aware of that any major phone company does this.

  • Sure - Wikipedia says it better than I could hope to:

    As English-linguist Larry Andrews describes it, descriptive grammar is the linguistic approach which studies what a language is like, as opposed to prescriptive, which declares what a language should be like.[11]: 25  In other words, descriptive grammarians focus analysis on how all kinds of people in all sorts of environments, usually in more casual, everyday settings, communicate, whereas prescriptive grammarians focus on the grammatical rules and structures predetermined by linguistic registers and figures of power. An example that Andrews uses in his book is fewer than vs less than.[11]: 26  A descriptive grammarian would state that both statements are equally valid, as long as the meaning behind the statement can be understood. A prescriptive grammarian would analyze the rules and conventions behind both statements to determine which statement is correct or otherwise preferable. Andrews also believes that, although most linguists would be descriptive grammarians, most public school teachers tend to be prescriptive.[11]: 26

  • The one I grabbed to test was the ROG Azoth.

    I also checked my Iris and Moonlander - both cap out at 6, but I believe I can update that to be higher with QMK or add a config key via Oryx on the Moonlander to turn it on.

  • Per this thread from 2009, the limit was conditional upon using a particular keyboard descriptor documented elsewhere in the spec, but keyboards are not required to use that descriptor.

    I tested just now on one of my mechanical keyboards, on MacOS, connected via USB C, using the Online Key Rollover Test, and was able to get 44 keys registered at the same time.

  • If you want to generate audiobooks using your own / a hosted TTS server, check out one of these options:

    If you don’t have a decent GPU, Kokoro is a great option as it’s fast enough to run on CPU and still sounds very good.

    If you’re going to use Kokoro, Audiblez (posted by another commenter) looks like it makes that more of an all-in-one option.

    If you want something that you can use without an upfront building of the audiobook, of the above options, only OpenReader-WebUI supports that. RealtimeTTS is a library that handles that, but I don’t know if there are already any apps out there that integrate it.

    If you have the audiobook generation handled and just want to be able to follow along with text / switch between text and audio, check out https://storyteller-platform.gitlab.io/storyteller/

  • From the Slashdot comments, by Rei:

    Or, you can, you know, not fall for clickbait. This is one of those...

    Ultimately, we found that the common understanding of AI’s energy consumption is full of holes.

    "Everyone Else Is Wrong And I Am Right" articles, which starts out with....

    The latest reports show that 4.4% of all the energy in the US now goes toward data centers.

    without bothering to mention that AI is only a small percentage of data centre power consumption (Bitcoin alone is an order of magnitude higher), and....

    In 2017, AI began to change everything. Data centers started getting built with energy-intensive hardware designed for AI, which led them to double their electricity consumption by 2023.

    What a retcon. AI was nothing until the early 2020s. Yet datacentre power consumption did start skyrocketing in 2017 - having nothing whatsoever to do with AI. Bitcoin was the big driver.

    At that point, AI alone could consume as much electricity annually as 22% of all US households.

    Let's convert this from meaningless hype numbers to actual numbers. First off, notice the fast one they just pulled - global AI usage to just the US, and just households. US households use about 1500 TWh of the world's 24400 TWh/yr, or about 6%. 22% of 6% is 1,3% of electricity (330 TWh/yr). Electricity is about 20% of global energy, so in this scenario AI would be 0,3% of global energy. We're just taking at face value their extreme numbers for now (predicting an order of magnitude growth from today's AI consumption), and ignoring that even a single AI application alone could entirely offset the emissions of all AI combined. Let's look first at the premises behind what they're arguing for this 0,3% of global energy usage (oh, I'm sorry, let's revert to scary numbers: "22% OF US HOUSEHOLDS!"):

    • It's almost all inference, so that simplifies everything to usage growth
    • But usage growth is offset by the fact that AI efficiency is simultaneously improving at faster than Moore's Law on three separate axes, which are multiplicative with each other (hardware, inference, and models). You can get what used to take insanely expensive, server-and-power-hungry GPT-4 performance (1,5T parameters) on a model small enough to run on a cell phone that, run on efficient modern servers, finishes its output in a flash. So you have to assume not just one order of magnitude of inference growth (due to more people using AI), but many orders of magnitude of inference growth.   * You can try to Jevon at least part of that away by assuming that people will always want the latest, greatest, most powerful models for their tasks, rather than putting the efficiency gains toward lower costs. But will they? I mean, to some extent, sure. LRMs deal with a lot more tokens than non-LRMs, AI video is just starting to take off, etc. But at the same time, for example, today LRMs work in token space, but in the future they'll probably just work in latent space, which is vastly more efficient. To be clear, I'm sure Jevon will eat a lot of the gains - but all of them? I'm not so sure about that.   * You need the hardware to actually consume this power. They're predicting by - three years from now - to have an order of magnitude more hardware out there than all the AI servers combined to this point. Is the production capacity for that huge level of increase in AI silicon actually in the works? I don't see it.
  • There’s a difference between a tool being available to you and a tool being misused by your students.

    That said, I wouldn’t trust AI assessments of students to determine if they’re on track right now, either. Whatever means the AI would use needs to be better than grading quizzes, homework, etc., and while I’m not a teacher, I would be very surprised if it were better than any halfway competent teacher’s assessments (thinking in terms of high school and younger, at least - in university IME the expectation is that you self assess during the term and it’s up to you to seek out learning opportunities outside class if you need them, like going to office hours for your prof or TA).

    AI isn’t useless, though! It’s just being used wrong. For example, AI can improve OCR, making it more feasible for students to hand in submissions that can be automatically graded, or to improve accessibility for graders. But for that to actually be helpful we need better options on the hardware front and for better integration of those options into grading systems, like affordable batch scanners that you can just drop a stack of 50 assignments into, each a variable number of pages, with software that will automatically sort out the results by assignment and submitter, and automatically organize them into the same place that you put all the digital submissions.