

Yes, true, but that is assuming:
- Any potential future improvement solely comes from ingesting more useful data.
- That the amount of data produced is not ever increasing (even excluding AI slop).
- No (new) techniques that makes it more efficient in terms of data required to train are published or engineered.
- No (new) techniques that improve reliability are used, e.g. by specializing it for code auditing specifically.
What the author of the blogpost has shown is that it can find useful issues even now. If you apply this to a codebase, have a human categorize issues by real / fake, and train the thing to make it more likely to generate real issues and less likely to generate false positives, it could still be improved specifically for this application. That does not require nearly as much data as general improvements.
While I agree that improvements are not a given, I wouldn’t assume that it could never happen anymore. Despite these companies effectively exhausting all of the text on the internet, currently improvements are still being made left-right-and-center. If the many billions they are spending improve these models such that we have a fancy new tool for ensuring our software is more safe and secure: great! If it ends up being an endless money pit, and nothing ever comes from it, oh well. I’ll just wait-and-see which of the two will be the case.
At least the EU is somewhat privacy friendly here (excluding the Google tie in) compared to whatever data sharing and privacy mess the UK has obligated people to do with sharing ID pictures or selfies.
Proving you are 18+ through zero knowledge proof (i.e. other party gets no more information than being 18+) where the proof is generated on your own device locally based on a government signed date of birth (government only issues an ID, doesn’t see what you do exactly) is probably the least privacy intrusive way to do this, barring not checking anything at all.