Consuming Multiple Archives Into A Single Model For A twitter_ebooks Bot

Published: May 25, 2017


Recently, I launched my own ebooks bot.

If you read the twitter_ebooks README, you’ll see that you can use the command ebooks consume to generate a text model for the bot to work from based on a JSON archive of tweets, or a plain text file.

This is nice, but one question I had was, can I build my text model from multiple sources?

It’s not currently documented in the README, but it turns out you can. To do so, you use the ebooks consume-all command.

The signature is as follows…

$ ebooks consume-all <model_name> <corpus_path> [corpus_path2] [...]

You’ll see consume-all mentioned in the usage string if you run ebooks with no arguments.

$ ebooks
     ebooks help <command>

     ebooks new <reponame>
     ebooks s[tart]
     ebooks c[onsole]
     ebooks auth
     ebooks consume <corpus_path> [corpus_path2] [...]
     ebooks consume-all <model_name> <corpus_path> [corpus_path2] [...]
     ebooks append <model_name> <corpus_path>
     ebooks gen <model_path> [input]
     ebooks archive <username> [path]
     ebooks tweet <model_path> <botname>
     ebooks version

While this is helpful, ideally, I think this feature should be documented in the README.

I submitted a PR to do just that here. However, the project hasn’t been updated in a while, so I’m not sure if / when it will be merged.


I hope you found this post helpful. If you have any questions or comments, feel free to drop a note below, or, as always, you can reach me on Twitter as well.

Max Chadwick Hi, I'm Max!

I'm a software developer who mainly works in PHP, but loves dabbling in other languages like Go and Ruby. Technical topics that interest me are monitoring, security and performance. I'm also a stickler for good documentation and clear technical writing.

During the day I lead a team of developers and solve challenging technical problems at Something Digital where I mainly work with the Magento platform. I've also spoken at a number of events.

In my spare time I blog about tech, work on open source and participate in bug bounty programs.

If you'd like to get in contact, you can find me on Twitter and LinkedIn.