A Rabbit Hole post showing a very odd problem, the process of tracking it down and the answer. Note that none of the technology listed was actually at fault, and no rabbits were harmed.
The Spot
When an npm ci --only=production was running in a RedHat UBI8 docker image, the install didn’t complete. The normal constantly changing ‘progress’ line was completey still. No change at all giving the appearence of something deadlocked.
Quick test on Ubuntu (that was the docker host) didn’t indicate any issues. Was this an issue because of the UBI8 image? I don’t use them often so less familar with their behaviour.
The Lure
This was an environment that had to work. For demo purposes and also it can be used ‘in the real world’. Plus if there was any any concern about UBI8 images that would ‘be bad’ for production use cases.
The Descent into the hole
- Reduce the test case by copying the dockerfile and reducing it so it included just the essentials. Base image, node installation.
- confirm that this shows the problem, it did
- Update dependencies node was out of date, so updated that and
npmto the latest, no problem still there - Observation that the ‘hang’ was always referencing some form of the network communication.
- Was
npmregistry up? Yes it was, confirmed bywget - But it seemd to be a different node module everytime
- Verbose mode didn’t show anything
- Was
- Vary type trigger conditions this was using
npm ci --only=productionwhat aboutnpm install.. had a few dev-dependencies that I only had locally so cleared those up and out the way- same basic behaviour
- was this
npmitself?
- Alternatives We’ve used
pnpmbefore, would that work? Yes - and a lot faster better output actually.
Pragmatics kick in here, and really need to make progress on work in progress before this, so swap to using pnpm
-
Further reduce Could this be docker itself? This is Docker in WSL2 after all.
- create a docker image based on Ubuntu and Node latest versions
- simple
docker runto map local directory into the image and test - the same behaviour is there, so UBI is out of the picture.
-
Networking this is going to be something amiss with networking maybe? By why does pnpm work?
Pragmatics again, at least this isn’t UBI8. Production level issues are therefore not so much a concern. Probability is that it’s a local problem.
-
Discussion with collegues no bright ideas, but need to reproduce the problem afresh
- used the docker run approach now the standard way to test
- with discussions ongoing - left the process running. If this is network then it would timeout (maybe?
-
Timeout did occur, but with an address that was
172.56.78.3:4586(note slightly nagging that there was something familar especially the port.)172.*addresses are local address with WSL, but it’s not the address of the vm currently.- try again… timeout took a lot longer, but the response from npm was useful
- both timeouts where on the same npm module. So the progress ‘message’ wasn’t an accurate reflection of the current module being located
- Is specific to this module?
-
Looking at the
npm-shrinkwrap.jsonthis address crops up a bit, but on an apparently random selection. No patterns, eg all dependencies on the same thing- delete the
npm-shrinkwrap.jsonand recreate - check - and no 172 addresses in resolved… ah
- delete the
-
Re run the core test - passes. no problems.
-
Ascend to the surface
The Complete Picture
- Earlier I had used this node module to test local changes to a node module. So a copy of Verdaccio had been running to locally host node modules. Hence why I recognised the port number.
- But Verdaccio caches modules as well for speed. The
npm-shrinkwrap.jsontherefore was a chimera of modules pulled direct from NPM and also for ones that had been cahced by verdaccio. - Clearly I hadn’t deleted the
npm-shrinkwrap.json- and npm appears to not do a complete recreate of it onnpm installthis had been done a few times - So with these
172.*addresses the install in the Docker didn’t work.
The still vague bits
- Why when the
npm installwas done NOT in the docker container did those172.*addresses not cause problems? - Why was the
npm-shrinkwrap.jsonnot updated on newnpm install?
TL;DR;
In the event of odd behaviour when installing node modules, delete all shrinkwrap, package lock files and node_modules.