From my perspective, the fact that GPT is actually very familiar with my project(which might be one of the reasons, given that we’ve been “vibe coding” together on and off for about two weeks)means that when I ask him to write tests, I only need to provide a brief description, and it knows exactly what needs to be tested.
In actual practice (after adding the tests, I tried reverting production code), to see the test failed, it’s almost always happen.
So, these tests are actually effective—even though they were written by AI and I didn’t even bother to look at them once during the whole process.
As I mentioned in my previous comment, AI can actually pull this off.
In fact, I think the Crystal specs written by GPT are quite good. The two files below were written entirely by AI (I didn’t change a single word):
If you are strictly pursuing DRY (Don’t Repeat Yourself) principles in your testing, then AI might fall short. But like I said before, back when I was writing a lot of tests in Ruby, I only cared about keeping the production code clean. I intentionally wrote redundant tests (in fact, due to a lot of copy-pasting, my test code was several times larger than my production code). So, in my view, letting AI handle this kind of redundancy nowadays is actually a great idea.
I’m not a dogmatist when it comes to TDD (Test-Driven Development) either(for example, the “test-first” approach). Especially after moving from Ruby to Crystal, the weight I give to testing has decreased significantly.
That said, regression testing for core features is still vital: you want to make sure you don’t accidentally introduce breaking changes during a refactor without realizing it.
I’m sure it can do an excellent job most of the time, but everybody got the story about that one time where the LLM fixed the test by removing the important part. If you don’t even review it, how can you be sure that they’re as good as you think?
If you don’t even review it, how can you be sure that they’re as good as you think?
I don’t care the spec code, only if can work.
the LLM fixed the test by removing the important part.
How this happen? I am chatting with GPT in codex, he know what is current need doing, if he change my production code, I know it immediately.
In fact, I write most of the production code myself (with the help from GPT, of course). I take responsibility for the code I write, while the AI is responsible for the specs. If there’s an error in the specs, GPT will point out the cause during our conversation.
GenAI cheats every time. instead of solve the problem properly, it will take shortcut, like hard code the solution for the test suit directly in the library, or adjust the unit case to the wrong output so it passes the test.
believe me, I have seen it doing it every time, no matter what model you are using, they all cheat when they can’t think of any proper solution.