hush-cli stop never fully shuts down full node #336

Open
opened 7 months ago by duke · 3 comments
duke commented 7 months ago
Owner

I am seeing weird behavior sometimes, where hush-cli stop never fully shuts down the node, no matter how long you wait. The end of debug.log shows :

2023-10-30 13:11:01 tor: Thread interrupt
2023-10-30 13:11:01 torcontrol thread exit
2023-10-30 13:11:01 scheduler thread interrupt
2023-10-30 13:11:01 msghand thread interrupt
2023-10-30 13:11:01 net thread interrupt
2023-10-30 13:11:05 addcon thread interrupt
2023-10-30 13:11:28 opencon thread interrupt

and then the node hangs forever after that. No RPCs can be run because the rpc interface is shut down, and even creating a ~/.hush/HUSH3/plz_stop file does nothing. Something is stuck in an infinite loop. I have seen this on both the dev and duke branches.

Notably we never see txnotify thread interrupt and the txnofity thread keeps running forever, so this seems related to that.

The only way to stop the node in this state is kill -9

I am seeing weird behavior sometimes, where `hush-cli stop` never fully shuts down the node, no matter how long you wait. The end of debug.log shows : ``` 2023-10-30 13:11:01 tor: Thread interrupt 2023-10-30 13:11:01 torcontrol thread exit 2023-10-30 13:11:01 scheduler thread interrupt 2023-10-30 13:11:01 msghand thread interrupt 2023-10-30 13:11:01 net thread interrupt 2023-10-30 13:11:05 addcon thread interrupt 2023-10-30 13:11:28 opencon thread interrupt ``` and then the node hangs forever after that. No RPCs can be run because the rpc interface is shut down, and even creating a `~/.hush/HUSH3/plz_stop` file does nothing. Something is stuck in an infinite loop. I have seen this on both the `dev` and `duke` branches. Notably we never see `txnotify thread interrupt` and the txnofity thread keeps running forever, so this seems related to that. The only way to stop the node in this state is `kill -9`
duke added the
bug
label 7 months ago
Poster
Owner

The plz_stop file method appears to not work because it's done via the scheduler thread and that is already stopped when this bug happens. We may be able to fix that if we look for it in the txnotify thread or elsewhere.

The `plz_stop` file method appears to not work because it's done via the `scheduler` thread and that is already stopped when this bug happens. We may be able to fix that if we look for it in the txnotify thread or elsewhere.
duke changed title from hush-cli stop never fuly shuts down full node to hush-cli stop never fully shuts down full node 7 months ago
Poster
Owner

So far I have only seen this when importing at least one zaddr privkey during IBD via hush-cli. The node will sync to a 100% and function normally, but then is unable to stop correctly. This is an edge case that GUI users and most CLI users will not run into and the bug may be present in the master branch, I still need to test that.

So far I have only seen this when importing at least one zaddr privkey during IBD via hush-cli. The node will sync to a 100% and function normally, but then is unable to stop correctly. This is an edge case that GUI users and most CLI users will not run into and the bug may be present in the master branch, I still need to test that.
Poster
Owner

I thought maybe that resendtx=0 might be hacky fix to this, but I still see this problem even with resendtx=0

I thought maybe that `resendtx=0` might be hacky fix to this, but I still see this problem even with `resendtx=0`
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.