Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [sumo-user] Multiple TraCI calls for multiprocessing

I tried manual assignment using:
BASE_PORT = 8800
MAX_WORKERS = 8

def objective(trial: optuna.trial.Trial):
    ...
    port = BASE_PORT + (trial.number % MAX_WORKERS) 
...

But it still gives an error saying "could not connect". However, it proceeds to run using one port at a time, giving an error for the other ports. If I keep n_jobs=1, traci connects and runs in both cases (get_free_port() and manual assignment) but for n_jobs >1, it fails to connect. The exact error is as follows:
[I 2025-07-10 15:10:04,932] A new study created in memory with name: multi-objective-rl
 Retrying in 1 seconds
 Retrying in 1 seconds
[W 2025-07-10 15:10:07,283] Trial 1 failed with parameters: {'p_empty': 0.8874815743440081, 'p_full': -0.10408980007074731, 'a_s': 0.0058245809451700215, 's_penalty': -0.1734385492609638, 's_high_scale': -0.0036933793219962327, 'w_penalty': -0.012466409024325576, 'w_high': -1.5633006203911357, 'alpha': 2.1582235720722465, 'beta': 0.43083178950925743, 'gamma': 0.02726034618253692, 'c': 46.34884736431611, 'scale_reward': 0.12149915994938308, 'scale_teleport': 0.09412134039139407, 'dist_factor': 0.7648090027509103} because of the following error: FatalTraCIError('Could not connect.').
Traceback (most recent call last):
  File "/home/hp/.local/lib/python3.10/site-packages/optuna/study/_optimize.py", line 201, in _run_trial
    value_or_values = func(trial)
  File "/home/hp/Downloads//tuner/optuna_tune.py", line 48, in objective
    reward_std, teleport_count = train_rl_with_sumo(reward_params, port)
  File "/home/hp/Downloads/tuner/main.py", line 44, in main
    env.reset()
  File "/home/hp/Downloads/tuner/environment.py", line 61, in reset
    self._start_sumo()
  File "/home/hp/Downloads/tuner/environment.py", line 57, in _start_sumo
    traci.start(sumo_cmd, port=self.port)
  File "/home/hp/.local/lib/python3.10/site-packages/traci/main.py", line 157, in start
    raise FatalTraCIError("Could not connect.")
traci.exceptions.FatalTraCIError: Could not connect.



On Thu, Jul 10, 2025 at 1:53 PM Jakob Erdmann <namdre.sumo@xxxxxxxxx> wrote:
try assigning a port manually instead of relying on get_free_port as there might be race conditions between get_free_port and binding to that port.

Am Mi., 9. Juli 2025 um 14:11 Uhr schrieb Rohan Verma via sumo-user <sumo-user@xxxxxxxxxxx>:
Dear community,

I am attempting to run multiple parallel processes using multiprocessing along with Optuna for hyperparameter optimization. While each individual run executes correctly, I am encountering an issue where traci fails to connect to SUMO when running in parallel (via n_jobs > 1).

To manage separate instances, I ensure each process uses a different TCP port by calling a get_free_port() function (see snippet below), and I pass this port into my function, which initializes the SUMO simulation and connects via traci.

def get_free_port() -> int:
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.bind(('', 0))
        return s.getsockname()[1]

Despite assigning unique ports to each trial, the subprocesses fail to establish a connection to SUMO. The error typically occurs at the point where traci tries to connect.

I'm using:

  • multiprocessing.set_start_method("spawn", force=True)

  • Separate ports for each run

  • Latest SUMO version

Do you have any insight into what might be causing traci to fail under parallel execution? Are there known issues or best practices for handling multiple simultaneous SUMO instances?

Any guidance or suggestions would be greatly appreciated.

Thanks and regards,
Rohan
_______________________________________________
sumo-user mailing list
sumo-user@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/sumo-user

Back to the top