implement netcat operating mode by f97ada87 · Pull Request #1222 · Bitmessage/PyBitmessage

f97ada87 · 2018-04-16T10:51:17Z

This PR enables a special headless operating mode (which I named "netcat" mode due to similarities with the Unix utility) where all object processing is disabled. Instead, raw objects received from the network are output to STDOUT unprocessed, also, any valid raw objects received on STDIN are broadcast to the network.

This is a re-implementation of PR #1149 , using the switches defined in PR #1214 . The discussions from the linked PRs should provide useful background.

f97ada87 · 2018-04-16T11:10:55Z

a line-too-long codacy issue was fixed

g1itch

How it stops? Maybe handle SIGINT? I cannot stop it either by SIGINT or SIGTERM.

omkar1117 · 2018-04-16T12:51:43Z

+                        sql.execute('''INSERT INTO objectprocessorqueue VALUES (?,?)''',
+                                   objectType, data)
+                        numberOfObjectsInObjProcQueue += 1
+                logger.debug('Saved %s objects from the objectProcessorQueue to disk. objectProcessorThread exiting.' %


PEP8 validation missing here

omkar1117 · 2018-04-16T12:53:23Z

+  outputs to STDOUT in format: hex_timestamp - tab - hex-encoded_object
+"""
+
+import threading


Maintain a order all the imports should be first.
from statements should be written after import statements.

omkar1117 · 2018-04-16T12:56:21Z

+        while True:
+            # read a line in hex encoding
+            line = self.inputSrc.readline()
+            if len(line) == 0:


why can't we use
if not line or if not line.strip() ?

"if not line" - can't strip an EOF :)

omkar1117 · 2018-04-16T12:57:32Z

+        self.inputSrc = inputSrc
+        logger.info('stdInput thread started.')
+
+    def run(self):


Please write a method with only 15 or 20 lines.

No. Go big or go home :)

Over longer term, I would also prefer to have shorter methods, some parts of the old code, like the class_objectProcessor are too long, but for this PR it's fine the way it is.

Noted with thanks. I think the ad-hoc object parsing accounts for a lot of avoidable LLOC wastage, however, as discussed in PR #1149 , there was no readily usable parser function to use instead.

omkar1117 · 2018-04-16T14:24:00Z

+    """
+    The objectStdOut thread receives network objects from the receiveDataThreads.
+    """
+    def __init__(self):


Follow Pep8 styling format.

omkar1117 · 2018-04-16T14:24:18Z

+            objectType, data = row
+            queues.objectProcessorQueue.put((objectType, data))
+        sqlExecute('''DELETE FROM objectprocessorqueue''')
+        logger.debug('Loaded %s objects from disk into the objectProcessorQueue.' % str(len(queryreturn)))


Pep8 styling missing

f97ada87 · 2018-04-16T15:19:06Z

@g1itch it stops clean on Ctrl-D (EOF) on STDIN or dirty with SIGKILL. Clean exit by signal is not available because of blocking read on STDIN.
Do you think it's needed? If yes, I can use a non-blocking read instead.

f97ada87 · 2018-04-16T15:20:42Z

@omkar1117 human or robot? If human, we need to talk.

g1itch · 2018-04-16T15:28:15Z

Of course, Ctrl+D. Yes, it works. It's enough for me.

f97ada87 · 2018-04-16T16:05:10Z

all pep8 issues are now fixed (except line length which is endemic) - thanks @omkar1117

g1itch · 2018-04-19T12:06:14Z

It could be useful to have an option to dump the messages which caused an exceptions with help of this std_io module.

f97ada87 · 2018-04-19T14:05:01Z

I'm sorry, I'm not sure what you mean, in a number of ways :)

by "messages which caused an exception", are you referring to:

the objects received from userspace via stdin thread, which fail unhexlify or other pre-parsing operations, or
the objects received from the network layer, which fail early sanity checks in bmproto.py and are dropped before even reaching inventory?

by "an option to dump", do you mean the objects should be:

displayed in raw form, and if yes, where exactly, considering that all STDIO is busy; just send them to logger.info?, or
dropped and not processed further, which is pretty much what we already do? :)

Asking because these are all things that I've considered. :)

g1itch · 2018-04-19T14:24:06Z

messages received from network which caused an exception in bmproto (with ERROR - Error processing in debug.log)
dump them to files in format produced by your std_io module for analysis

It may be another application for this new code when it will be merged.

f97ada87 · 2018-04-20T09:44:34Z

Yes, this makes sense. As a matter of fact, the initial commit from PR #1149 was tapping into bmproto.py upstream of the sanity checks, and was capturing broken objects along with the good ones. It was only after updating the code to tap into the ObjectProcessorQueue instead (downstream of bmproto) that we stopped capturing them. Anyway, this should be easy to add in the future if needed, with a couple of conditionals inside bmproto.py

f97ada87 · 2018-05-02T09:32:04Z

Hi guys, not trying to rush things, just double-checking if there are any open issues requiring attention from my end.

PeterSurda · 2018-05-02T09:54:59Z

Haven't had time yet to check it out

coffeedogs

I haven't tested yet, will do after the sql refactor. Any reason to think it wouldn't work cross-platform? Note I'm new to the codebase so apologies for any dumb points I raise.

coffeedogs · 2018-05-09T09:11:47Z

            opts, args = getopt.getopt(
-                sys.argv[1:], "hcdt",
-                ["help", "curses", "daemon", "test"])
+                sys.argv[1:], "hcdtn",


The option '-n' often has connotations of 'dry run' or 'numeric only'. While we don't have anything using 'n' right now I'm thinking of avoiding future confusion if we do. Is there another letter that you would say makes as much sense as 'n'?

I thought of that and decided that none of the common uses of "-n" (dry-run, no-resolve, no-output, numeric etc) are applicable in the Bitmessage context.
A second best option would be "-c" (netCat) which is already taken.

coffeedogs · 2018-05-09T09:15:42Z

-                sys.argv[1:], "hcdt",
-                ["help", "curses", "daemon", "test"])
+                sys.argv[1:], "hcdtn",
+                ["help", "curses", "daemon", "test", "mode-netcat"])


The other options are modes too. Perhaps you could drop the "mode-"?

See general comment, there's a naming theme.

coffeedogs · 2018-05-09T09:41:06Z

+            # unhex the input with error rejection
+            try:
+                binObject = unhexlify(hexObject)
+            except Exception:


According to the docs unhexlify might throw TypeError. Can we be more specific here otherwise codacy will complain of a bare except.

coffeedogs · 2018-05-09T09:42:44Z

+                logger.info("STDIN: Invalid object size")
+                continue
+
+            if not state.enableNetwork and state.enableGUI:


You set state.enableGUI = False above. Will this conditional ever be reached?

state.enableGUI is only false in netcat mode; std_io may have other uses outside of netcat mode, which are not currently included. Yes, the conditional makes sense from a logical perspective.

coffeedogs · 2018-05-09T09:55:26Z

+                    try:
+                        # stdioStamp, = unpack('>Q', unhexlify(hexTStamp))
+                        _, = unpack('>Q', unhexlify(hexTStamp))
+                    except Exception:


Can we just catch struct.error here? Oh, plus TypeError for unhexlify.

coffeedogs · 2018-05-09T10:24:11Z

+
+            # duplicate check on inventory hash (both netcat and airgap)
+            if inventoryHash in Inventory():
+                logger.info("STDIN: Already got object " + hexlify(inventoryHash))


Not sure how many of these one might expect during 'normal' operation. Is 'info' the right log level? Maybe this and the next log statement could be debug level if it would otherwise lead to info being spammed. Maybe we expect people running in this mode to not care about other info level statements but be very interested in knowing that messages were or were not added, I don't know.

You're probably right. It's only useful when running in manual mode and pasting objects in console, useless in most other scenarios. I'll change it to debug.
I guess what I really need here is not logger.info pollution, but the ability to switch logging levels as needed via getopt; I might actually open an issue about that.
EDIT - Done: #1246

coffeedogs · 2018-05-09T10:39:24Z

+        # REFACTOR THIS with objectProcessor into objectProcessorQueue
+        queryreturn = sqlQuery(
+            '''SELECT objecttype, data FROM objectprocessorqueue''')
+        for row in queryreturn:


By waiting until we have synchronously pulled all objects from the database and put all objects on the queue we are missing out on some benefits of parallelism and we'll hit a memory limit with large data sets. Maybe this is unavoidable or related to your suggested refactoring.

Also, keeping the number of places where raw SQL is used to a minimum is a good idea. Again, I assume this would be part of your suggested refactoring.

The block marked by refactor ... /refactor is duplicated verbatim from class objectProcessor; the comment indicates my intention to refactor it into class objectProcessorQueue, a task which is outside the scope of this PR.

coffeedogs · 2018-05-09T10:42:07Z

+        # /REFACTOR THIS
+
+    def run(self):
+        while True:


When the queue is empty would this not cause unnecessary 100% CPU? Perhaps a small sleep is needed inside the while loop?

Edit: no it wouldn't, Queue.get(block=True) blocks when the queue is empty. I was thinking of the AMQP library I was most recently using.

coffeedogs · 2018-05-09T10:43:18Z

+                        numberOfObjectsInObjProcQueue += 1
+                logger.debug('Saved %s objects from the objectProcessorQueue to disk. objectProcessorThread exiting.' %
+                             str(numberOfObjectsInObjProcQueue))
+                # /REFACTOR THIS


Yes please!

coffeedogs · 2018-05-09T10:44:50Z

+            objectType, data = queues.objectProcessorQueue.get()
+
+            # Output in documented format
+            print "%016x" % int(time.time()) + '\t' + hexlify(data)


Should we use sys.stdout.write() instead of print here?

It would be indeed the logical choice, I'm just not sure how cross-platform it would be. Print is 100% portable and not wrong.

f97ada87 · 2018-05-13T03:05:34Z

@coffeedogs a general comment before I address some specific points; the answer to most your queries is along the lines of "backward compat", "minimizing the diff" and "one change at a time".
This PR is ported from of a private fork that has several special operating modes:

--mode-netcat / -n described here,
--mode-airgap / -a (no network, STDIO to ObjectProcessor) which will be ported next
--mode-router / -r and --mode-seeder / -s which may never be ported

As much as possible, I tried to preserve compatibility with both PyBitmessage and the original fork.

f97ada87 force-pushed the opmode-netcat-v2 branch 2 times, most recently from c6cfe51 to e9425c8 Compare April 16, 2018 11:09

PeterSurda requested review from MahendraNG, PeterSurda, g1itch and omkar1117 April 16, 2018 11:50

g1itch reviewed Apr 16, 2018

View reviewed changes

omkar1117 suggested changes Apr 16, 2018

View reviewed changes

f97ada87 force-pushed the opmode-netcat-v2 branch 2 times, most recently from 63893b5 to 7d6a0b5 Compare April 16, 2018 16:00

coffeedogs suggested changes May 9, 2018

View reviewed changes

f97ada87 force-pushed the opmode-netcat-v2 branch from 7d6a0b5 to 1b61750 Compare May 13, 2018 04:43

implement netcat operating mode

5fd8990

f97ada87 force-pushed the opmode-netcat-v2 branch from 1b61750 to 5fd8990 Compare May 13, 2018 10:33

PeterSurda removed the request for review from MahendraNG May 21, 2018 15:03

PeterSurda added the enhancement New feature label May 21, 2018

Conversation

f97ada87 commented Apr 16, 2018

Uh oh!

f97ada87 commented Apr 16, 2018

Uh oh!

g1itch left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

f97ada87 commented Apr 16, 2018

Uh oh!

f97ada87 commented Apr 16, 2018

Uh oh!

g1itch commented Apr 16, 2018

Uh oh!

f97ada87 commented Apr 16, 2018

Uh oh!

g1itch commented Apr 19, 2018

Uh oh!

f97ada87 commented Apr 19, 2018

Uh oh!

g1itch commented Apr 19, 2018

Uh oh!

f97ada87 commented Apr 20, 2018

Uh oh!

f97ada87 commented May 2, 2018

Uh oh!

PeterSurda commented May 2, 2018

Uh oh!

coffeedogs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

f97ada87 May 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

f97ada87 May 13, 2018 •

edited

Loading

coffeedogs May 9, 2018 •

edited

Loading