Hi,
Not yet jumping towards a definite answer, but there are a couple of options to keep in mind.
There are now two main variants in use to base the tests on: Junit and TestNG. As far as I can tell, this choice is completely arbitrary and neither has any real advantage over the other (please correct me if I'm wrong).
There are two phases to use for the tests with maven. The unit test phase (using surefire) and the integration test phase (using failsafe). Sometimes the choice is arbitrary, sometimes there's an actual reason; integration tests allow one to use the full output produced by a maven module, e.g. a .war.
There are also two main ways how tests are practically conducted using Arquillian. There is the so called "testable" mode (also called "magic mode") where a test class that looks like a client test is magically (hence the name) transported to the server and run from there. Arquillian automagically using a protocol transfers test results to the client (junit or testng). The other mode is "non-testable" (also called "run-on-client"), where the unit or IT test explicitly runs on the client, and where explicit HTTP requests are submitted to the server under test. There's a few sub-variants here again. Some TCKs have e.g. a Servlet where they return some result and that result is then inspected by the client test. E.g. assert response contains "user logged-in". Others have a Servlet that manually inspects some results server-side, and then only return "succeeded" or "failed" to the client test. In some cases there's even a kind of junit/testng emulation like layer implemented and manually invoked at the server side.
Finally, but it largely follows from the choices above, there are TCKs that produce a "single test jar" that contain all the tests, which therefore requires a "runner pom" that imports this test jar via a special directive. Other TCKs are a "multi-module maven project", where each piece of test code is its own maven module and can be used and deployed as a single standalone application (and therefore also debugged as such).
We can probably create a table to categorize the existing TCKs into the above options.
Kind regards,
Arjan Tijms