Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark (modelrift.com)

386 points by jetter 23 hours ago | 150 comments

jhot 22 hours ago [-]

Last weekend I bought my wife a bike off marketplace. It was in good condition but was missing one of the internal cable routing grommets. I gave Claude pictures of the pill-shaped hole by itself and with my digital calipers in the long and short directions.

Gave it a short prompt and it gave me an openscad model with everything parametrized. I printed with no changes in tpu and it was nearly perfect on the first try. Claude put in a 0.3mm subtraction in the x/y dimensions and I lowered it to 0.1 and it's perfect.

Much easier shape than ancient Roman architecture but still very cool how easy it was.

simplyluke 21 hours ago [-]

Yeah, CAD has been my personal example of "oh the barrier to entry for this skill was high enough that I didn't do it and now I can be passably bad at it enough to get some simple things done"

I've had similar experiences with making simple functional parts off a 3d printer with OpenSCAD + LLMs. I'm very aware that the models are worse at it than say, generating react code, and I'm also the antithesis of a skilled pilot. It's still cool and has resulted in me starting to learn a new skill at a hobby level.

dempedempe 18 hours ago [-]

It's like this with a lot of things now. For example, Nix's learning curve used to be a huge barrier to entry. Now with LLMs, I'm using nix-darwin and home-manager for dotfiles, package management, and have individual flakes in all of my projects for cryptographically reproducible builds!

rlt 17 hours ago [-]

Nit: there’s nothing “cryptographic” about reproducible builds.

“Reproducible build” already usually implies bit-by-bit reproducibility.

illiac786 13 hours ago [-]

“The reproducibility is cryptographically verifiable with hashes“ would be the full sentence, but it’s a mouthful.

pabs3 7 hours ago [-]

Build reproducibility checks usually use bitwise comparison, not hash comparison.

The Reproducible Builds project also wrote diffoscope, which goes quite far with helping identify where differences occur and how to fix them.

https://reproducible-builds.org/ https://diffoscope.org/ https://try.diffoscope.org/

illiac786 1 hours ago [-]

Let’s say, for the positive case, hash comparison is significantly faster.

pabs3 1 hours ago [-]

I feel like that is quite unlikely. Both the hash and bitwise comparisons read both files in both cases. In the not-equal case the hash reads the entirety of both files, so its slower than a start-to-end bitwise comparison, which exits at the first not-equal bit. In the equal case, both read the entirety of both files. Various other bitwise strategies can be faster than start-to-end, rdfind for example checks the start of the file first, then the end, then the rest of the file.

dekhn 11 hours ago [-]

yes, but it's still not cryptological, it's just verification using hashes.

fc417fc802 11 hours ago [-]

The hash being cryptographically secure is significant. In contrast, you could use (for example) md5 to non-cryptographically verify that the full process matched.

dekhn 9 hours ago [-]

Sorry, the point I was making is that this isn't cryptography- it's the properties of a cryptographic hash (hard to spoof) that are useful. I don't think any verified build program uses the hash to encrypt data at any point. If I'm wrong on this point, that's fine, but please include a link.

fc417fc802 8 hours ago [-]

Sure, "verified in a cryptographically secure manner" is technically not equivalent to "cryptographically verified" but the response "it's not cryptographic" is rather ambiguous at best given that it is, in fact, a cryptographically secure manner of verification. The key observation here being that an algorithm or process being "cryptographically secure" does not mean that it is "cryptographic" in nature (ie implements or uses cryptography).

dempedempe 16 hours ago [-]

I meant with Nix you're comparing hashes. With Docker, you're using pinned versions

bt1a 17 hours ago [-]

i thought it mainly implied architectural/hardware compatibility and deterministic output

aidenn0 10 hours ago [-]

Nix mostly does not guarantee deterministic output. It rather guarantees deterministic inputs, and then sandboxes the system to inhibit the build from accessing the outside world.

Deterministic inputs do not always imply deterministic outputs.

pabs3 7 hours ago [-]

Indeed, the Reproducible Builds community is working on fixing non-deterministic build output https://reproducible-builds.org/

pimeys 17 hours ago [-]

Nix is also great at work. You keep the server nix code in the same repo and OpenCode can just change and test server config.

0x696C6961 21 hours ago [-]

Learning to make simple parts in onshape is pretty darn easy (and fun).

simplyluke 7 hours ago [-]

I mean, like any other skill that has pretty much been my experience (though I tried fusion + openscad), but there is something about being able to ask a computer all the dumb noob questions that makes that first phase easier.

jeffbee 19 hours ago [-]

Yeah. I teach this after school to 7th grade kids. Anyone can pick this up in a few hours.

chalupa-supreme 18 hours ago [-]

They taught us to make Legobricks with CAD when I was in 6th. Wish I retained more of that and that it would be more widely taught.

jeffbee 16 hours ago [-]

I am reasonably confident that access to solid modeling and additive fabrication is now more widespread than ever.

k-Whale 8 hours ago [-]

same — LLMs turn skills i'd parked for years into 'just try it' territory, which is genuinely new.

skinner927 17 hours ago [-]

Claude does well if you can provide all dimensions. It fails at guessing though. The real magic is when you can provide one dimension or photograph with a ruler in it and the AI will figure the rest out. Right now, Claude anyways, is pretty bad at guessing.

jonah 19 hours ago [-]

I was recently trying to get models to generate a 3D fortune cookie. Claude in three.js and Gemini in openSCAD. Neither really got the concept or could get very close at all. It's a surprisingly complex shape I guess.

car 10 hours ago [-]

Probably easier with Trellis 2 or Meshy.ai

8note 16 hours ago [-]

with the shape you probably want something thats good at bends/fabric

cause youd start with the flat shape, the set some contraints that certain edges are colinear

jetter 21 hours ago [-]

these small functional prints are exactly where OpenSCAD and LLM generation shines

amelius 21 hours ago [-]

Does it optimize for no support?

05 20 hours ago [-]

You optimize for no support when selecting print orientation (but for anything semi-cylindrical like described that would be the only sane orientation and the one slicer would choose when you smash the 'Auto Orientation' button).

jlhawn 17 hours ago [-]

> Antigravity was the only autonomous agent that implemented the Pantheon’s signature interior ceiling pattern: repeated square coffers visible through the oculus.

That is seriously really impressive. I looked at the 3D model and didn't even thing to LOOK INSIDE the building before reading this.

Here's [1] the 3D model with `show_cutaway` enabled.

[1] https://modelrift.com/models/pantheon-benchmark-antigravity-...

nancyminusone 13 hours ago [-]

I can't decide whether it's good or bad that it has included outside information clearly not present in the prompts to make the model. Clearly its the right thing to do if you want "the Pantheon" but I don't think any draftsman or engineer would find this acceptable work.

hereme888 17 hours ago [-]

Was just going to say.... I looked inside by accident, and it gives a better impression of intelligence and effort than the outside.

mellosouls 23 hours ago [-]

Antigravity may well Top the whatever benchmark but:

My Antigravity (forced) replacement for Gemini CLI requires me to log on via browser every time I use it, and my Antigravity IDE won't update at all, so:

If it's ok I'd prefer they just work on reaching a baseline acceptable rollout before worrying about being Top in anything.

Ps actual title:

OpenSCAD LLM Benchmark: Building the Pantheon

jetter 22 hours ago [-]

I agree, my main concern regarding Google AI products is this endless pain around the UX of login / billing / upgrades / product sunsets... but their LLM models are good and Antigravity 2.0 is not that bad either (unless you lost all you Antigravity 1.0 setup and projects - like many people did)

pelagicAustral 22 hours ago [-]

I just use Claude Code and intellij, so I don't understand why so many people complain about Antigravity ditching VS Code, what's the surface not covered by using Antigravity CLI + VS Code (or any other IDE)?

jeromegv 21 hours ago [-]

Gemini cli was open source. Antigravity cli is not. Not at feature parity, missing many features and now we are forced to migrate away from Gemini cli before anti gravity cli is ready.

surajrmal 21 hours ago [-]

The difference in its ability is immense. Even with less features it makes a lot of sense to switch. It really shows how much the harness matters almost equally to the model.

lern_too_spel 17 hours ago [-]

At least one of the missing features is a basic piece of functionality (showing token quota used). Without it, you're pretty much guaranteed to get locked out for a week with no warning.

freedomben 22 hours ago [-]

I'm not GP, but I am somewhat excited about antigravity CLI. I adopted Gemini CLI early and really liked it, though over time it got dumber and dumber until a point when I realized it was foolish to use it instead of claude/codex. I'm hopefuly that antigravity CLI won't go through that path, but also can't fight a skepticism.

jeromegv 21 hours ago [-]

I don’t think it’s the cli that was dumber, just the model it was using. They drastically reduced limits on their best model so that’s likely how you got stuck downgrading model and getting worse results.

WarmWash 20 hours ago [-]

I'm sensing in reality that behind the scenes there is a difficult trade-off between quantization and usage limits. You can have a "smart" model but poor limits, or good limits and a "dumb" model.

This seems very similar to mobile data limits (remember those years?), where there wasn't enough tower bandwidth to serve everyone unlimited data, so telecos were in constant tension between data caps and bandwidth throttling.

It wasn't until 5G came along with 100x network capacity that they could finally give everyone "unlimited" data.

antonvs 8 hours ago [-]

Which plan were you on? Gemini CLI auto-downgrades the model if you run out of tokens on the better models. It gives you stats about the models used at the end of every session.

mchusma 17 hours ago [-]

I just left the google I/O feeling less confident about google's execution here. - Gemini 3.5 flash is strange. Old cutoff, basically better than 3.1 pro at soem things worse at others, sometimes cheaper, sometimes more expensive than 3.1 pro. - Antigravity had seemed abandoned, and people speculated them cutting it off, and they kind of did migrating everyone to a new antigravity - Google "shipped the org chart" and they have so many AI products and none seem best of breed (e.g. the Gemini integration in google docs is worse than claude)

I was actually hoping for "Opus level intelligence at Haiku costs" model or "Sonnet level performance in Gemini 3.0 pricing", either of these would have been a workhorse, plus a competitor to Claude/Codex (1 app to do things). I got neither.

bitpush 6 hours ago [-]

The cut off doesn't matter since all of them use tools.

VectorLock 22 hours ago [-]

The forced upgrade from Gemini CLI which I liked as much, and as some ways better than Claude Code was bad. But them just sending out that email on Wednesday that basically said "Thanks for subscribing to Google One AI Pro, as of right now we're adding limits to your account. Tough shit you get nothing." left a REALLY bad taste in my mouth. I had previously praised the "AI Pro" subscription as a good value.

leoedin 21 hours ago [-]

I quit AI Pro earlier this year for the same reason. I went to use it one day (I don't think I'd even used it much in the preceding week) and found that my limits had been reduced overnight and my usage was already too high. I had something like a 7 day wait until it reset.

I get you have to change limits, but reducing limits in a way which both applies retroactively and has a really long reset period is just infuriating. If they'd applied the new limits more gently or at the next billing period I'd probably have continued paying.

I don't mind paying a fair price for a service that provides value, but I really hate having a service I think I'm paying for rug-pulled with no clear justification.

freedomben 22 hours ago [-]

Having my workflow disrupted is the main reason I never adopted Antigravity, despite liking it. I'm glad to see G is invested, but the older I get the more protective I am of my workflow.

hootz 22 hours ago [-]

And the only realistic way to protect our workflow is by avoiding vendor lock-in like the plague.

freedomben 15 hours ago [-]

Exactly. I admit it's a bit extreme, but this is a big reason why I insist that neovim is my IDE, and I won't adopt anything else. If I can't make it work in neovim, I will move to something else (unless I have no choice, but that happens very rarely at this point).

arthurtully 15 hours ago [-]

I've got an AI pro plan and haven't been able to log in for months. Endless checking in with my google support guy. At least Dinesh wishes me good health every week, so that's nice.

the_real_cher 22 hours ago [-]

Wild that it doesn't cache the creds.

elaus 22 hours ago [-]

Just to clarify: I believe it should cache them (it works for me).

So far I like it much more than Gemini CLI (my previous daily driver for personal projects). Seems more mature and "feels more intelligent" (very subjective ofc)

timdorr 18 hours ago [-]

It does. It uses go-keyring under the hood, which has its own issues with certain systems.

If you're on WSL, getting dbus to work is a PITA. There may be other OS-level issues that folks are running into.

dezgeg 18 hours ago [-]

It requires a keyring service being installed (accessed over dbus) and if there isn't one it just silently doesn't store them anywhere. Pretty bad UX.

littlecranky67 21 hours ago [-]

My (unfounded) guess is this is to prevent usage by other tools/openclaw. The browser login will have a fingerprinting to make sure you are a human.

stuaxo 21 hours ago [-]

"Pantheon" bloody hell, why is it people writing these articles are so up themselves, it's so overbearing.

tpmoney 21 hours ago [-]

The article is literally about asking these models to generate 3d models of the Pantheon.

kaashif 7 hours ago [-]

Indeed. At least an LLM would've read the article and realized that.

ponyous 20 hours ago [-]

I've run a tons of benchmarks for OpenSCAD for all kinds of models and setups, and what I realised is:

- Models are very jagged (might excel in one type of 3d model, but not another)

- Gemini models are the least jagged in my experience and have the best image understanding

- Gemini models are also the most creative (which may be undesirable if you want precise CAD part)

- Overall this benchmark doesn't prove much because one 3d model (and one attempt) is just not enough. I am usually testing on at least a dozen models each generated 3 times, but should really do much more, but it's too pricey for a solo dev.

Still, thanks for publishing this. Will be definitely run flash 3.5 soon to see how it performs.

willis936 7 hours ago [-]

OpenSCAD doesn't do curves. It's useless. I'm not sure why it continues to get so much attention.

tjoff 15 hours ago [-]

I've had such a bad time trying to do this myself. You might get a half-way decent draft on the first try and then you start to "debug" this and after a very frustrating session you realize that the model can't properly "see" the results. That is, you just can't iterate on it, at all.

I'm guessing that most harnesses/tools will resize an image before processing and in doing so will loose enough detail to make it much harder to reason about - especially wireframe images.

I'm sure I'm holding it wrong, but this test didn't really test this. It was just a one off. That breaks down pretty quickly and especially if you don't have reference pictures of what you are trying to create.

1970-01-01 20 hours ago [-]

Creating a single real-world object and declaring it a benchmark? No, it doesn't work that way for a robust tool. You need to do something like Iron Chef, with a Greek architecture theme and and a panel or judge that declares the winner. This is just seeing which tool subjectively makes the best looking Pantheon.

Eridrus 20 hours ago [-]

Yeah, this is less of a benchmark and more "I like this one guys!".

Just totally subjective grading criteria of a single poorly defined example with no end use case in mind to guide how to even do evaluation.

davej 14 hours ago [-]

It's still interesting in a similar way to Simon Willison's Pelicans on a bicycle.

Eridrus 14 hours ago [-]

The Pelicans are mostly just entertainment.

dhfbshfbu4u3 22 hours ago [-]

Still a long way from shorting Autodesk.

As a side note Autodesk released an agentic assistant back in December for Fusion. Six months later it is still quite bad.

hobofan 21 hours ago [-]

It is almost comically bad. I've had a few simple parts to design for 3d printing in the last weeks and tried it with them (each are about 4 operations on the timeline), and it never created close to what I was trying to do even if spelled out step by step according to Fusion naming.

At this point I'm not even sure if it can properly create a simple primitive solid.

blorenz 19 hours ago [-]

Have you yet tried the Fusion MCP that was launched last month? https://aps.autodesk.com/blog/bringing-fusion-claude-creativ...

shideneyu 18 hours ago [-]

Still a long way to go, but I'm sure it will get there eventually.

mvkel 5 hours ago [-]

I'm working on a parenting tech device and the enclosure for it is completely AI generated. I hadn't a clue where to even start with 3D modeling, and an LLM taught me that it's code like anything else.

Weirdly, Opus 4.5 one-shotted it perfectly, but this was right before the nerfing controversy, and it's been very difficult to make even minor tweaks to the enclosure ever since.

It's like Opus went from an expert shape rotator to not having any idea what it's working on.

ljlolel 5 hours ago [-]

Ditto for my enclosure for https://quill.lorehex.co/feather

4.7 has been fine for making edits though

seemaze 17 hours ago [-]

I'm unconvinced, this is one of the most iconic historical buildings with tomes written about it and plenty of existing photographs and public models to train on.

I would be more interested in benchmarking the modeling of an anonymous structure based on provided references alone. It kind of feels like the shallow magic of watching an LLM one-shot a to-do app..

ecshafer 13 hours ago [-]

The fact that the model recreated the interior dome pattern is concerning and makes me think this didn't test what they think. The interior dome pattern isn't visible in either picture. So it took the picture, and the name, then either via search or training data, knew that there should be an interior pattern. So it could be getting information on the pantheons dimensions or existing models. Whereas other models might be building based purely on what is seen in the reference pictures.

thanhhaimai 13 hours ago [-]

From the article:

> Antigravity was the only autonomous agent that implemented the Pantheon’s signature interior ceiling pattern: repeated square coffers visible through the oculus.

The article also includes a video showing the patterns visible through the roof oculus.

lukax 4 hours ago [-]

I wonder what would happen if they used Kimi 2.5 directly instead of Cursor Composer 2.5. Composer is a fine tune of Kimi. Probably they didn't want to test "Chinese" models.

seniorsassycat 17 hours ago [-]

I tried Claude code designing a snap fit, vase mode printed box. Ultimately didn't work out, it couldn't get the tolerances right and kept designing features that wouldn't print in vase mode.

Scad needs unit tests. It would be powerful to asset that a profile doesn't have slope greater than 45°, that intersection of two objects is null, or specific volume.

It also needs cut away views. I got okay results using boxes to remove everything except a sliver, to view a slice and internal details. But without hash marks, texture, or outlines it can be hard to tell the forms.

gbgarbeb 15 hours ago [-]

"Vase mode snap-fit box" sounds to me like "flexible concrete".

saifulhuq 7 hours ago [-]

[flagged]

jonasmaturana 13 hours ago [-]

I've been using Claude to generate OpenSCAD scripts for the last few months, then exporting to Bambu Studio. Never really liked the OpenSCAD editing part though, so I built a little personal tool: https://webscad.aicentralen.dk/

One neat thing is that each color becomes a separate object on export to Bambu Studio, so it's easy to assign different filaments. One of the first things I made with it was these multicolored tag keys: https://webscad.aicentralen.dk/examples/name-tag

samcheng 13 hours ago [-]

This being Hacker News, it's worth mentioning that Bambu have been exceedingly bad actors around the AGPL and Open Source.

https://www.youtube.com/watch?v=3W5NNiHnviU

debarshri 22 hours ago [-]

I have been using GPT 5.5 to build a video game. Benchmark sounds about right. It generates assets and sprite good enough, if not closer to AAA level games. Will check antigravity now.

phn 22 hours ago [-]

Would you be able to share a bit about your workflow? Have been meaning to try AI gen for game models, and would love to know how people are tackling this.

debarshri 21 hours ago [-]

I have alot to share. I'm writing a blog about it. I'll share along with the game.

roflcopter69 21 hours ago [-]

Sounds interesting! Please don't forget to link that in this comment thread :)

okandship 2 hours ago [-]

benchmarks should probably separate syntax validity from manufacturable output

thedougd 17 hours ago [-]

I've been trying out MCP servers for FreeCAD to mixed results.

One area I had near magic was providing a land survey which includes details in writing of the plat. It took those directions and beautifully reconstructed the boundaries to exact precision in CAD.

Where I ran into trouble was creating good constraints on sketches without being overly explicit. I kept running into it creating distance constraints from an arbitrary point instead of using other elements in the diagram that a human drafter would think to do by default.

lithiumii 18 hours ago [-]

That's actually a reason for me to try it again. My past attempts to use LLM for OpenScad has greatly improved my own OpenScad skills.

sjia 15 hours ago [-]

Isn't CadQuery more professionally than OpenSCAD close to traditional CAD / mechanical engineering workflows. Not sure which model (ChatGPT, Gemini, and Claude Code) is better for CadQuery code generation?

coderenegade 8 hours ago [-]

It is, but they have different use cases. CadQuery uses a geometry kernel that does boundary representation, which you need for path generation for modern manufacturing tooling. OpenSCAD produces a standard mesh representation (i.e. triangles), which is insufficient for cutting and subtractive manufacturing, but often fine for additive manufacturing (3D printing).

UncleOxidant 12 hours ago [-]

Antigravity is an agent/harness not a model. It should say Gemini 3.5 High Tops OpenSCAD Architectural 3D LLM Benchmark.

usermac 18 hours ago [-]

I've been using LLM's to do my OpenSCAD work for over two years now. It's always where I start (and end).

faangguyindia 22 hours ago [-]

Why are specialized CAD making LLM models not showing up? In future are we going to have same model for everything? from programming to creative writing to CADs?

coderenegade 8 hours ago [-]

There are good information theoretic reasons to suspect that general models will be better than specialized ones, because knowledge and skills often overlap different areas, sometimes in surprising and unintuitive ways.

And yes, I'm aware that that statement might seem to fly in the face of much of the past two years of industry development, where specialized models have been in vogue. I think they'll settle to being appropriate for low cost "good enough" applications, but I'm less convinced they'll have anywhere near the fidelity of larger frontier models.

embedding-shape 22 hours ago [-]

If you have a model that only know how to model CAD but also doesn't know history, and was trained on visual language of said history, how is it supposed to be able to model the Pantheon in the first place? It'd only be able to model exactly what you can describe with text, or even worse, exactly what it'd be able to visually extract from images via the vision encoders, for "vision models", but it'd be a far cry from what you see in this blogpost, would be my guess.

xnx 22 hours ago [-]

> In future are we going to have same model for everything?

A model that knows more in general, will often be better at specific tasks. e.g. If you ask a model to "make a program that estimates the annual production of a solar installation", it needs to have been trained on a lot more than just Python code.

JumpCrisscross 18 hours ago [-]

> A model that knows more in general, will often be better at specific tasks

Is this your hypothesis or broad conclusion among AI experts?

lifty 21 hours ago [-]

You might combine a general world model with a python coding model in that case. Not sure if it's better, just saying.

lukeschlather 17 hours ago [-]

What's the difference between a "general world model combined with a python coding model" and a multimodal LLM?

pshirshov 18 hours ago [-]

That's curious, I've been trying to do some parametric modeling with Claude - and its performance was abysmal.

plumeria 18 hours ago [-]

I've had a positive experience building a library of parametric HVAC duct parts using Claude, Gemini and Codex using build123d (they all review the specs and code collaboratively).

a3w 22 hours ago [-]

Claude Code 2.1 / Opus 4.7 looks best to me: Dome and ceiling structure is correcter than the others.

Why is this medium ranked, and not on par with the best two?

WarmWash 20 hours ago [-]

Look at a picture of the Pantheon, the dome isn't as dome-like as you would imagine. It's more like a hump shape.

andybak 20 hours ago [-]

Dome looks wrong to me. Look at a few other photos - it's far from being a hemisphere

emmanuelsemugga 17 hours ago [-]

This is a really important project. Preserving humanity’s knowledge and making it openly accessible,including in formats usable by AI systems feels like one of the most valuable things happening right now. Thank you for the clear technical instructions and the bulk download options.

Projects like Anna’s Archive make it much easier for researchers and builders to work responsibly with large datasets.

ReptileMan 22 hours ago [-]

The only thing faster moving that AI these days are the goalposts. Three years ago we would have been amazed if models were able to produce anything, now we have the luxury of nitpicking. Even the worst entries in the benchmark are quite impressive.

alnwlsn 17 hours ago [-]

Using reference images is a huge step for this sort of thing. The text-only approaches I've seen before were never going to be that good even with "perfect" AI, simply because describing 3D objects in text is not something that anyone is really any good at.

WarmWash 20 hours ago [-]

I remember getting wound up about latency and server issues playing counter-strike in the early '00s. At the same time though, it was hard to justify being angry because playing a multiplayer game with friends who were scattered all over town was something that had to be real magic.

I guess the wow!->adjust->complain->wow!->... cycle is endless as a human

ramon156 22 hours ago [-]

No one asked for faster horses, they still became obsolete when cars came. Nothing new

happyopossum 19 hours ago [-]

> No one asked for faster horses

Err, yes they did. Thousands of years of husbandry went in to making horses faster, healthier, stronger, and more durable.

I think the quote you’re looking for is “if I had asked people what they wanted, the would have said faster horses”. It’s attributed to Henry Ford, although there is debate about whether or not he said it.

The point of the quote is that “faster horses” is the consumer response to “how do I get more work done” as it comes from the viewpoint of “how am I doing my work now”. An ingenious mind looks at the desired outcome and works backwards and may come to a different and dramatically improved solution instead of merely improving the current tool.

LatencyKills 22 hours ago [-]

Things mature, and expectations grow appropriately. That is true of more than just LLM performance.

xnx 22 hours ago [-]

Sure, but it's good to have some perspective and some awe that any of this would've been absolute unbelievable magic just 3 years ago. Even if all AI progress stopped immediately, we'd need 10 years to digest and incorporate the technology.

nutjob2 18 hours ago [-]

Why look back in awe when technological innovation will just keep accelerating. Soon what we have today will seem quaint. Best to keep looking forward with impatience and discontent.

LatencyKills 22 hours ago [-]

As someone who's been building developer tools (Visual Studio and Xcode) for 25 years, I don't have a perspective problem. We were doing "code completion" back in the 90s and could never have predicted that an LLM would write code at the current level of quality.

My point is that with every new model release, the expectations grow. I don't know how else to say that.

nutjob2 18 hours ago [-]

Welcome to human nature.

megiddo 22 hours ago [-]

This would be the same Antigravity 2.0 that "surprise, no longer an IDE, did I forget to mention that? Lolol."

kyrra 21 hours ago [-]

I'm a googler, opinions are my own.

My take is that it's a fancy wrapper around the CLI tool. It's there to organize multiple conversations and see all the related output and generate files.

I've been using the internal version and I've actually liked it quite a bit. It's clear from when I started using it, it's not an editor, and they have ways to open your normal editor outside of it. They have turned it fully into an agent management tool.

When the antigravity development team doesn't have to focus on all the things that vscode is already good at, it lets them simplify the UI and do only agent related things. We'll see if this bet works out for them, but so far I like the idea.

jdw64 22 hours ago [-]

To be brutally honest, I'm disappointed with antiGravity. It feels incredibly unGoogle-like. The AI billing models are fragmented, and the AntiGravity IDE is currently tripping over something as trivial as a basic Electron deployment config bug.

Don't get me wrong, I don't think AI coding is a bad thing. For East Asians like myself, it levels the playing field with Westerners, so as long as you rigorously review the AI's output, it's a perfectly viable tool.

However, the absolute farce we just witnessed with the antiGravity2.0 update really raises doubts about whether 'vibe coding' can actually be trusted. If even a behemoth like Google is dropping the ball like this, it says a lot.

NortySpock 20 hours ago [-]

> I don't think AI coding is a bad thing. [...] it levels the playing field [...]

I'd like to put regional differences aside and say AI coding / LLMs are incredible tools.

While I'm nervous about my job as a programmer being able to pay a prevailing wage after the dust settles, I do hope that everyone gaining access to an AI coder / tutor will allow anyone to be able to achieve things they previously only dreamed of. If the tutor costs pennies per session, sure, the tutors are out of work, but I hope everyone can thus up-skill to work on the challenges they actually want to work on.

I'm taking baby-steps into coding in Elixir on the other monitor, a language I had only read about before, because an LLM is walking me through the changes, answering my questions, and accepting my rebuttals. There's no way I would have time to pick up the language otherwise.

Yesterday I vibe-coded some additions to the static site generator python script for my blog. It was awesome to be able to think in terms of desired features instead of digging around documentation for libraries and syntax.

embedding-shape 22 hours ago [-]

> AI billing models are fragmented ... IDE is currently tripping over something as trivial ... farce we just witnessed with the antiGravity2.0 update

I'm sorry, but that sounds exactly like almost every single Google "product" out there, they seem to only care about throwing stuff over the wall as quickly as possible, and you'd have a hard time finding a single Google product that doesn't also feel filled with fragmented choices, like every project of theirs have a different project manager every week.

nutjob2 18 hours ago [-]

> For East Asians like myself, it levels the playing field with Westerners

Why do you say that? Are there language or cultural disadvantages to being East Asian?

jdw64 16 hours ago [-]

It’s less about overt disadvantages and more about the practical linguistic and cultural friction we face when reading documentation or engaging with the community. For instance, the open-source ecosystem is deeply rooted in US-centric culture, heavily relying on Western idioms, jokes, and implicit cultural context in discussions. I wouldn't call this 'disadvantages,' but it certainly acts as a handicap and adds an extra layer of cognitive load to our learning curve

Onplana 20 hours ago [-]

Going to try it. just downloaded. will see how it is compared to Claude Code

anony-123 20 hours ago [-]

So, does it mean Antigravity is better than Claude code with opus model? Given this benchmark. I once tried Antigravity and it was just disappointing.

dilap 19 hours ago [-]

Why Codex GPT-5.5 High instead of Extra High, I wonder?

u8 20 hours ago [-]

It's crazy how I can see articles like this, but in my practical every day use antigravity is a horrible consumer experience. The TUI is broken. You cannot type input while the model is outputting text, otherwise both get messed up and the the TUI renders a sickly blob of text. There are no keyboard shortcuts to switch between planning and execution mode, or a way to directly load skills.

The usage limits are too aggressive, too. I tried to generate a quick Deno Fresh website to act as a a redirect to my GitHub from socials (literally the simplest possible thing I could have asked of it) and it chewed through my five hour limit in tokens from scaffolding.

To me, as a developer of CLI developer tooling, its obvious not a lot of thought or testing went into this product, but as Google has said before: the models are the product".

nycdatasci 22 hours ago [-]

And yet 300+140=460. A very jagged surface indeed. https://gemini.google.com/share/c2a187275e26

sigbeta 18 hours ago [-]

Why would you use an LLM for this? They are non deterministic models.

This is also an probably part of extended prompt that disallowed coding, Gemini always does calculation with a little python snippet because it is deterministic and accurate.

nycdatasci 12 hours ago [-]

Sure. I'll take the bait, but I assume I'm replying to an AI model.

Why would you use an LLM for this? My comment was about the jagged nature of intelligence, so the prompt provides an example of that.

You can see the entire conversation in the shared link. There was no pre-prompt. Even after pushing it to write python, it hallucinated the same output. It later told me that it doesn't have access to a sandbox through the web UI, but it could execute code in a sandbox if invoked via API.

18 hours ago [-]

dist-epoch 21 hours ago [-]

Was that part of a bigger prompt?

Flash 3.5 fails exactly like in your sample: https://gemini.google.com/share/97521a8752d9

but Flash 3.1 Lite initially fails, but then corrects itself: https://gemini.google.com/share/dc0889ec85ba

happyopossum 19 hours ago [-]

No matter what I try I can’t get Gemini to give me the incorrect result. Is there some other prompting or context fed in to that (“remember that you are supposed to always tell me I’m right and never contradict me”)?

nycdatasci 12 hours ago [-]

There was no other prompt, no system prompt, etc. Many users have reproduced, exactly as it demonstrated in the parent.

Are you using the flash models? Reasoning models or extended thinking will change the result.

GPT 5.5. Instant shows the same error. If the given prompt isn't working, you can also try "300+140=460 is this correct?". I suspect that leading with the equation may be part of the issue, but haven't tested much.

sigbeta 18 hours ago [-]

There was definitively an pre prompt fed to that. I cannot reproduce this result on either 3.1 flash or 3.5 flash.

nycdatasci 9 hours ago [-]

Perhaps you have a system prompt? Many users have reported similar issues: https://www.reddit.com/r/wallstreetbets/comments/1tjxa6g/goo...

spiderfarmer 22 hours ago [-]

Next month they'll be beaten again.

And next year Google will probably sunset Antigravity.

If it doesn't make Google billions, don't trust them.

lern_too_spel 17 hours ago [-]

Why should I care if they sunset it? I switch between multiple agentic coding tools on the same projects, sometimes several times per day. The cost of switching is basically zero.

PunchTornado 22 hours ago [-]

Plenty of google products dont make billions and they are still alive

serf 22 hours ago [-]

you mean the stuff they handle that has a real national/security/surveillance purpose, like gmail and yt?

I can't imagine why (or who) that'd be kept alive for..

funny how some of their projects have undisclosed budgets and profits.

toasty228 22 hours ago [-]

Which ones are not massive data traps or ad delivery mechanisms ?

smcl 22 hours ago [-]

Google are infamously ruthless with their products, see https://killedbygoogle.com/

fHr 12 hours ago [-]

codex rust opensource cli > all that slop

bobbycastorama 22 hours ago [-]

Why are half of the comments on Hackernews stereotypical AI-bros whose lives revolve around tech, and the other half sceptical commentators whose lives also revolve around tech but they are disappointed with its performance?!

Where are the normal people :/

frank00001 22 hours ago [-]

We are just reading the comments.

alnwlsn 18 hours ago [-]

The normal people are the ones not writing comments, but I'll give you one 'cause you asked:

I'm a Solidworks user. Most Solidworks or other pro CAD users would consider OpenSCAD kind of like MS Paint. Yes, you can draw the Mona Lisa in it, but it doesn't really work the same way.

Even so, the examples shown here are better than what I've seen before. They seem to be on the right track using images instead of long paragraphs of text to try to describe the object. They are still missing the constraints and dimensions that come naturally to pro cad users (it can be done manually in openscad of course), but if you're just making a video game it's probably going to be fine for that.

JumpCrisscross 18 hours ago [-]

> Where are the normal people

Not using OpenSCAD?

sigbeta 18 hours ago [-]

"Normal people" probably does not fall in the ballpark of HN target audience.

I'd say its 50/50 pessimistic and optimistic, with pessimistic attracting more attention because of human nature.

andybak 20 hours ago [-]

Why would a non-tech person be on Hacker News? Isn't the clue in the name?

EasyMark 19 hours ago [-]

The people in the middle are still waiting and see , mostly it’s the extremes that are fully vested and loudest on the internet

elorant 22 hours ago [-]

Both parts seem pretty normal to me.

robert_ddsbos 20 hours ago [-]

[flagged]

MarStudio 21 hours ago [-]

[dead]

eddyaipt 21 hours ago [-]

[flagged]

rizkimurtadha 18 hours ago [-]

[flagged]

hacker_mar 20 hours ago [-]

[dead]

beanjuiceII 22 hours ago [-]

google..no thanks

fnordpiglet 18 hours ago [-]

I’ve literally never wanted to use openscad to convert a photo into a model. Usually I have a functional requirement such as making an en enclosure with a spec sheet to work from on the enclosed device.

Claude 4.6 before the lobotomy in Claude code was able to take a PSU spec sheet and my requirements for glands and ports, use YAPP and openscad MCPs to iteratively and unassisted build end to end a printable enclosure that was perfectly suited for the PSU with right dimensions and screw holes, mountings, grills, gland ports, everything, placed for optimal printing. This was the moment I felt like LLMs had really arrived.

A photo of a building? Why. That’s a mesh problem and is about fidelity. A technical spec sheet and diagrams to functional print with intelligent choices about the functional part baked in? That’s useful.

Rendered at 09:43:38 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.