implement netcat operating mode#1222
Conversation
c6cfe51 to
e9425c8
Compare
|
a line-too-long codacy issue was fixed |
g1itch
left a comment
There was a problem hiding this comment.
How it stops? Maybe handle SIGINT? I cannot stop it either by SIGINT or SIGTERM.
| sql.execute('''INSERT INTO objectprocessorqueue VALUES (?,?)''', | ||
| objectType, data) | ||
| numberOfObjectsInObjProcQueue += 1 | ||
| logger.debug('Saved %s objects from the objectProcessorQueue to disk. objectProcessorThread exiting.' % |
| outputs to STDOUT in format: hex_timestamp - tab - hex-encoded_object | ||
| """ | ||
|
|
||
| import threading |
There was a problem hiding this comment.
Maintain a order all the imports should be first.
from statements should be written after import statements.
| while True: | ||
| # read a line in hex encoding | ||
| line = self.inputSrc.readline() | ||
| if len(line) == 0: |
There was a problem hiding this comment.
why can't we use
if not line or if not line.strip() ?
There was a problem hiding this comment.
"if not line" - can't strip an EOF :)
| self.inputSrc = inputSrc | ||
| logger.info('stdInput thread started.') | ||
|
|
||
| def run(self): |
There was a problem hiding this comment.
Please write a method with only 15 or 20 lines.
There was a problem hiding this comment.
No. Go big or go home :)
There was a problem hiding this comment.
Over longer term, I would also prefer to have shorter methods, some parts of the old code, like the class_objectProcessor are too long, but for this PR it's fine the way it is.
There was a problem hiding this comment.
Noted with thanks. I think the ad-hoc object parsing accounts for a lot of avoidable LLOC wastage, however, as discussed in PR #1149 , there was no readily usable parser function to use instead.
| """ | ||
| The objectStdOut thread receives network objects from the receiveDataThreads. | ||
| """ | ||
| def __init__(self): |
| objectType, data = row | ||
| queues.objectProcessorQueue.put((objectType, data)) | ||
| sqlExecute('''DELETE FROM objectprocessorqueue''') | ||
| logger.debug('Loaded %s objects from disk into the objectProcessorQueue.' % str(len(queryreturn))) |
|
@g1itch it stops clean on Ctrl-D (EOF) on STDIN or dirty with SIGKILL. Clean exit by signal is not available because of blocking read on STDIN. |
|
@omkar1117 human or robot? If human, we need to talk. |
|
Of course, Ctrl+D. Yes, it works. It's enough for me. |
63893b5 to
7d6a0b5
Compare
|
all pep8 issues are now fixed (except line length which is endemic) - thanks @omkar1117 |
|
It could be useful to have an option to dump the messages which caused an exceptions with help of this |
|
I'm sorry, I'm not sure what you mean, in a number of ways :)
Asking because these are all things that I've considered. :) |
It may be another application for this new code when it will be merged. |
|
Yes, this makes sense. As a matter of fact, the initial commit from PR #1149 was tapping into bmproto.py upstream of the sanity checks, and was capturing broken objects along with the good ones. It was only after updating the code to tap into the ObjectProcessorQueue instead (downstream of bmproto) that we stopped capturing them. Anyway, this should be easy to add in the future if needed, with a couple of conditionals inside bmproto.py |
|
Hi guys, not trying to rush things, just double-checking if there are any open issues requiring attention from my end. |
|
Haven't had time yet to check it out |
coffeedogs
left a comment
There was a problem hiding this comment.
I haven't tested yet, will do after the sql refactor. Any reason to think it wouldn't work cross-platform? Note I'm new to the codebase so apologies for any dumb points I raise.
| opts, args = getopt.getopt( | ||
| sys.argv[1:], "hcdt", | ||
| ["help", "curses", "daemon", "test"]) | ||
| sys.argv[1:], "hcdtn", |
There was a problem hiding this comment.
The option '-n' often has connotations of 'dry run' or 'numeric only'. While we don't have anything using 'n' right now I'm thinking of avoiding future confusion if we do. Is there another letter that you would say makes as much sense as 'n'?
There was a problem hiding this comment.
I thought of that and decided that none of the common uses of "-n" (dry-run, no-resolve, no-output, numeric etc) are applicable in the Bitmessage context.
A second best option would be "-c" (netCat) which is already taken.
| sys.argv[1:], "hcdt", | ||
| ["help", "curses", "daemon", "test"]) | ||
| sys.argv[1:], "hcdtn", | ||
| ["help", "curses", "daemon", "test", "mode-netcat"]) |
There was a problem hiding this comment.
The other options are modes too. Perhaps you could drop the "mode-"?
There was a problem hiding this comment.
See general comment, there's a naming theme.
| # unhex the input with error rejection | ||
| try: | ||
| binObject = unhexlify(hexObject) | ||
| except Exception: |
There was a problem hiding this comment.
According to the docs unhexlify might throw TypeError. Can we be more specific here otherwise codacy will complain of a bare except.
| logger.info("STDIN: Invalid object size") | ||
| continue | ||
|
|
||
| if not state.enableNetwork and state.enableGUI: |
There was a problem hiding this comment.
You set state.enableGUI = False above. Will this conditional ever be reached?
There was a problem hiding this comment.
state.enableGUI is only false in netcat mode; std_io may have other uses outside of netcat mode, which are not currently included. Yes, the conditional makes sense from a logical perspective.
| try: | ||
| # stdioStamp, = unpack('>Q', unhexlify(hexTStamp)) | ||
| _, = unpack('>Q', unhexlify(hexTStamp)) | ||
| except Exception: |
There was a problem hiding this comment.
Can we just catch struct.error here? Oh, plus TypeError for unhexlify.
|
|
||
| # duplicate check on inventory hash (both netcat and airgap) | ||
| if inventoryHash in Inventory(): | ||
| logger.info("STDIN: Already got object " + hexlify(inventoryHash)) |
There was a problem hiding this comment.
Not sure how many of these one might expect during 'normal' operation. Is 'info' the right log level? Maybe this and the next log statement could be debug level if it would otherwise lead to info being spammed. Maybe we expect people running in this mode to not care about other info level statements but be very interested in knowing that messages were or were not added, I don't know.
There was a problem hiding this comment.
You're probably right. It's only useful when running in manual mode and pasting objects in console, useless in most other scenarios. I'll change it to debug.
I guess what I really need here is not logger.info pollution, but the ability to switch logging levels as needed via getopt; I might actually open an issue about that.
EDIT - Done: #1246
| # REFACTOR THIS with objectProcessor into objectProcessorQueue | ||
| queryreturn = sqlQuery( | ||
| '''SELECT objecttype, data FROM objectprocessorqueue''') | ||
| for row in queryreturn: |
There was a problem hiding this comment.
By waiting until we have synchronously pulled all objects from the database and put all objects on the queue we are missing out on some benefits of parallelism and we'll hit a memory limit with large data sets. Maybe this is unavoidable or related to your suggested refactoring.
Also, keeping the number of places where raw SQL is used to a minimum is a good idea. Again, I assume this would be part of your suggested refactoring.
There was a problem hiding this comment.
The block marked by refactor ... /refactor is duplicated verbatim from class objectProcessor; the comment indicates my intention to refactor it into class objectProcessorQueue, a task which is outside the scope of this PR.
| # /REFACTOR THIS | ||
|
|
||
| def run(self): | ||
| while True: |
There was a problem hiding this comment.
When the queue is empty would this not cause unnecessary 100% CPU? Perhaps a small sleep is needed inside the while loop?
Edit: no it wouldn't, Queue.get(block=True) blocks when the queue is empty. I was thinking of the AMQP library I was most recently using.
| numberOfObjectsInObjProcQueue += 1 | ||
| logger.debug('Saved %s objects from the objectProcessorQueue to disk. objectProcessorThread exiting.' % | ||
| str(numberOfObjectsInObjProcQueue)) | ||
| # /REFACTOR THIS |
| objectType, data = queues.objectProcessorQueue.get() | ||
|
|
||
| # Output in documented format | ||
| print "%016x" % int(time.time()) + '\t' + hexlify(data) |
There was a problem hiding this comment.
Should we use sys.stdout.write() instead of print here?
There was a problem hiding this comment.
It would be indeed the logical choice, I'm just not sure how cross-platform it would be. Print is 100% portable and not wrong.
|
@coffeedogs a general comment before I address some specific points; the answer to most your queries is along the lines of "backward compat", "minimizing the diff" and "one change at a time".
As much as possible, I tried to preserve compatibility with both PyBitmessage and the original fork. |
7d6a0b5 to
1b61750
Compare
1b61750 to
5fd8990
Compare
This PR enables a special headless operating mode (which I named "netcat" mode due to similarities with the Unix utility) where all object processing is disabled. Instead, raw objects received from the network are output to STDOUT unprocessed, also, any valid raw objects received on STDIN are broadcast to the network.
This is a re-implementation of PR #1149 , using the switches defined in PR #1214 . The discussions from the linked PRs should provide useful background.