|Re: [jetty-users] SolrJ/Solr: HTTP protocol violation: Authentication challenge without WWW-Authenticate header|
On 6/4/2023 8:37 AM, Greg Wilkins wrote:
Shawn, The other thing to do is to bisect the changes from a jetty-9.4.x that you think works to jetty-9.4.48 that you think does not work. There are not that many changes in jetty 9.4, so it should not be too much work... but it will be some effort.... which is why we don't offer open source support on jetty-9.4 anymore - we just don't have the cycles to track down things like this (especially when more often than not, after doing the work we find it is not a bug in jetty). If you can identify an actual commit that you think is bad, then that will help working out what the issue is.Of course the other thing to do is to get somebody to take out a commercial support agreement, so we can put in the effort to helpdebug this issue (even if it turns out to be not jetty).
I don't have a 9.4 version that works, so I don't think I can bisect it. I have 9.4.48 server (the Solr side) and a 10.0.13 or 10.0.15 client (in SolrJ) that are having trouble. There are no other versions. Upgrading the jetty client two point releases did not help. The problem could be in the Jetty Server, in Solr, in SolrJ, in the Jetty client, or even in my own code. A previous version of the migration code didn't index simultaneously with multiple threads, and it did not throw this error. It DID throw OOME, though... which it did long before it reached 20 million documents. Never did find the memory leak. I happened to be working on code that I was able to adapt to this purpose, So it seems to be related somehow to really pounding the servers with alot of parallel requests. I'm running my program with one query thread and two indexing threads, but running the program on my little single-node single-shard Solr install has 53 threads running. SolrJ is figuring out which shard each document goes to, splitting the batch further into smaller batches for each shard, and making lots of http requests in parallel. Because the system where I am running this for real has three shards, there would be even more threads.
That first version of the program was sending to a load balancer and then Solr figured out which server to send each request to... this version speaks directly to Solr using the SolrCloud-aware client ... it knows exactly which shard each document is headed to, and know which server has leader replica of each shard. This means Solr does not ever have to forward received requests to another server. The data is indexed a LOT faster with the new version of the program. I've found a workaround which is working ... split the job into batches that are all significantly less than 20 million documents, which is the approximate danger point. I discovered this workaround after I had started this thread. So at this point I would like to be able to figure out what happened, so we can decide where the problem is. I think that if we could upgrade Solr (and with it the Jetty server) that the problem would disappear, but there are a bunch of moving parts and it might be in any of those parts. It could always be a problem in my code, which I have been examining closely. Its usage of SolrJ pretty simple, and so far I haven't seen a problem. Unfortunately, the window where I can perform lots of tests has closed. The servers are moving into production and the extra load from my program running is unacceptable. Thanks, Shawn
Back to the top