Life Of Navin

Random Musings, Random Bullshit.

kc

Review: PyCon India 2018

The Python community has been one of the few communities that I've actively been part of over the last couple of years, in part because I'm a huge fan of the language (As the saying goes: Python is the second-best language for everything you want to do!) and also because the community has always been one of the most welcoming tech communities I've come across. I've been lucky to be able to contribute back to this amazing community, be it through open source contributions, conducting Python/ML courses for (usually) college students, or by being part of the volunteer team that organizes PyCon India every year. This year, I was part of the CFP workgroup along with a bunch of super cool people, and the months of effort that many people put in culminated last weekend with PyCon India 2018 taking place in Hyderabad. What follows are my thoughts on the event.



First things first: PyCon India 2018 was a grand success! This year was an especially challenging year for PyCon India for multiple reasons:
  • This was the 10th edition of PyCon India, so the community as a whole wanted to have an event which would fit the occasion.
  • Due to multiple disagreements (to put it lightly) within the community, the conference was shifted out from Bangalore to Hyderabad with just 4 months to set up the entire event, basically from scratch.
  • In between all of this, the PSSI was also dissolved (which IMHO was the best thing to happen to the org since it had long ago ceased to stand for anything useful), and the new PyCon team had to take on the added load of figuring out how to handle finances/sponsorships for the event.
Thankfully, right from the get-go, the nation-wide python community stepped up to ensure that the conference would go through as smoothly as possible. Kudos to the local Hyderabad python community for leading the efforts! I'm happy to say this is easily in my top 3 PyCon India experiences. Special mention of the food at well. Hyderabad definitely lived up to its foodie reputation and I enjoyed the delicious food served over the course of the conference.

Something that I feel worked really well this year was the EuroPython model of having workgroups for different responsibilities. The model was adopted out of lack of alternatives, but it allowed for much deeper focus, and more community interaction during the course of the conference organisation.

From a talks perspective, I've always complained that PyCon suffers from trying too hard to balance between beginner and advanced talks (mostly because of the diverse audience at PyCon), but this year I personally feel we came really close to striking the right balance. My favourite talks at PyCon India 2018 were:

  • The Future of Python by Armin Ronacher: This was a talk I was really looking forward to. There are very few people who have not used some python module created by Armin. But he's had a very flaky relationship with both Python as a language and as a community over the last few years. His talk, which was the opening keynote of the conference, was an amazing look at what the Rust community is doing right, be it in code, or through community and what the Python community can learn from it. It was lovely to have someone speak so openly about the shortcomings of Python, and point to actionable items, rather than simply praise the emperor's new clothes. The talk included zingers like "The wider python community and core developers want very different futures for the language" in front of an audience of 4+ python-core devs and 1500 wider community members! :P


  • Large scale web crawling using Python by Anand B Pillai and Noufal Ibrahim: This talk was a pleasant surprise, especially since I spent nearly a year working on a product which did web scraping at a scale which can only be described as large! Despite never having heard of their project, but it's funny how similar the system we used at Bloomreach is (which is more complex, but also has half a decade of engineering behind it), and it was interesting to see and compare the compromises we made vs the ones they made with very similar constraints and having faced the same issues.

  • Cleaning data with Python by Anand S: Anand is someone whose talks at any conference are always of the same outstanding quality and this talk was no exception. While a decidedly unsexy topic to talk about, the problem of data cleaning is something that literally anyone who works with data runs into nearly every time. Anand's talk focused on different techniques for extracting and cleaning data, and how to handle the billion possible edge-cases you can end up with. Every PyCon, I find one amazing talk which uses live code through the talk, and this was the one from this PyCon (Yes, he used a pre-made Jupyter notebook but that still counts).

  • The M-Word by Sidu Ponnappa: Sidu is, again, a not-so-obvious keynote speaker for a PyCon, since he's not really a Pythonista in any significant way. But what he has done is build communities and companies which are very strongly engineer-driven and built from the bottom-up. His talk was about the dreaded M-word, "Manager". His talk was more of a brain dump on the parallels between code management and org management, and how different structures and dynamics come into play in organisations, in the same way that systems grow in complexity over time. His talk was enlightening (except maybe the first 20 minutes which felt more like a Go-Jek pitch, and could have been much shorter) and highlighted how engineering principles apply to org management.

  • Lightning Talks: Lightning talks are 5 minute talks given by people from the community about any topic they feel is relevant to the conference. In the past, lightning talks have been  pretty meh, with unprepared speakers, half-baked slides, incoherent talks et al but this year was very different. The lightning talks were really well presented, tight talks about projects ranging from a RC car controlled using EEG waves (and using a neural net LSTM), to the role python played in the post-floods rescue operations in Kerala, to types of engineers and how they shift between types, to using python for astronomy, to the super fun talk by @datapythonista on well.... random nuggets of coolness! Special mention of the organisation of lightning talks with dual-laptop round-robin screen switching, which ensured the downtime between lightning talks was practically zero.
@datapythonista doing his thing!
Of course, as a volunteer driven event, there's always things that can be improved upon. These are the things I didn't particularly love about PyCon India this year and I hope we can improve next year:

  • Main Hall Setup: The main hall setup had multiple issues. The big one was that the sponsor stalls were in the same room as the main hall. Unfortunately this meant that many talks were disturbed by the noise from the sponsor stalls, where they had ongoing events for participants like quizzes/darts etc. You could hear the noise even from the third row of the Main hall and it was very distracting. The other issue was the sound setup in the same hall which was messed up leading to the speaker not being able to hear anyone from the audience who used a mike (relevant during the QnA session that follows every talk) throughout the conference.
  •  Dev Sprints: Dev sprints provide an opportunity for coders to work with open source project maintainers to create patches for open source projects. The barrier to entry for open source is, even today, seen as being high, and events like this help break that myth. Unfortunately, dev sprints messed up this time. Tickets were announced very late, and even then they were charged and limited to 150 seats. Charging someone money to introduce them to open source contribution is ridiculous IMHO. We need as many good developers as we can get (and then some more) and charging/limiting the number of participants for this was pretty bad. This was called out on Twitter by multiple people, and I genuinely think we could have done a better job of it.
  • Job board: It's a known fact that PyCon acts as a platform for companies and developers to meet each other and create win-win situations where developers get connected to good jobs which require their skill-set while companies get talented individuals to interview/join. This year, we had a job board, which was basically a flex sheet, and companies were encouraged to write their details on sticky notes and paste them on the job board. People could browse through the board and apply/look for more details. It's ironic that PyCon took such a non-technical approach to this problem. A simple web-app would have been so much more useful for this, and been less painful, more easily searchable, and definitely more practical.
The job board at the end of PyCon
Overall, as I mentioned at the start of the blog, PyCon India 2018 was a stunning success and I truly enjoyed myself throughout my stay in Hyderabad. I definitely look forward to participating in PyCon India 2019, both as a organizing team volunteer as well as a participant (Possible speaker as well? Maybe? Yes? No? :P), because after all these years, what still remains true is this:

>>> import this

Google Authenticator 2FA on Mac OS with oathtool

Multi-factor authentication (Commonly 2FA) is a security godsend. You really should enable it on every account you have that supports it. I usually use Google Authenticator for time-variant OTP systems. However, I find it irritating to have to reach for my phone to check for the OTP every time I log in to a site. Why do I find it irritating? Because of 2 reasons:
  1. I actually have to get to my physical device, which I usually don't carry around with me. 
    • In office, I usually keep my phone away from me to avoid distraction/unwanted calls/notifications.
  2. Even after I open up Authenticator, I have to type out the digits from screen to screen. 
    • I know there are other alternatives, but Google Authenticator is most widely supported across services I use daily: AWS, Okta, Google Suite, Pager etc. and it's not going away soon.
I always wondered if there is an easier way to get the added security of 2FA, with more flexibility. And the solution was to simply use 2FA from my laptop itself. So I came up with a workflow. This is my current workflow on Mac OS Sierra (but should work on any Mac).

Step 1: Generate 2FA tokens on laptop
This turned out to be simpler than I imagined. I simply needed to use the awesome oath-toolkit, which basically behaves like Google Authenticator for the laptop. It can be used to easily generate time-variant one time passwords, which is what GA does! Setting it up is a breeze:

$ brew install oath-toolkit
$ oathtool --totp -b "your_key_here" | pbcopy

The key you use here is a key generated when you set up OTP (You probably remember scanning a QR code). If you don't have the key saved locally, you will have to do a one-time reset of your 2FA to generate a new key.

We pipe the generated token to pbcopy to copy it to clipboard. This allows us to generate totp tokens and have them ready to be pasted when needed. No need to look for your mobile phone for OTP anymore!

Step 2: Smoother Workflow
Now that we can generate OTP tokens without a phone, let's clean up the process. Currently, we need to go to the terminal every time we want to generate a new token. What if we could bind this to a global key command instead? Mac OS has an awful global keyboard macro system (Yay Linux as always), but it can still be done. This is how I do it:
  • Install ICanHazShortcut, a nifty Mac app for global hotkeys. It's not too fancy, but delivers what it promises!
  • Set up an Automator service to simply paste clipboard contents when called:
    • Go to Applications -> Automator -> New -> Service
    • Service receives "no input" in "any application"
    • Drag the "Run Applescript" action to the right and fill in the following and then save the workflow as paste-from-clipboard. You can save it in your home directory:

  • Finally, bring it all together! Set up keyboard shortcuts in ICanHazShortcut, and set the command to run as:  
/usr/local/bin/oathtool --totp -b "your_key_here" | pbcopy | automator ~/paste-from-clipboard.workflow/
Here I assume that the workflow you created was named paste-from-clipboard and saved in your home directory. You may not need to specify the full path for oathtool, but I had to.

And that's it! Now whenever you're on a page that needs a 2FA token, simply type in your keyboard shortcut, and your token will be filled in!

...And the love kickstarts again ;)

PS: Yes, this is unsafe because if someone get's physical access to your system, then he can access your 2FA key and generate tokens, thereby rendering your 2FA useless. Then again, if he has physical access to your system, then this is just one of a million ways to get access to your data, so the attack vector surface increase is not as prominent. Of course, if you're uber concerned about security, skip this. But if you want a compromise between security and usability then this is a decent solution! :)

Debugging Running JVM Without Restart

Debugging a process actively running on a JVM without restarting the application and setting up flags and attaching a debugger is a a bit of a pain. However, there's quite a bit you can do to debug a running process without needing a JVM restart.

This. But with less enthusiasm. Much less enthusiasm.
Here's a quick walkthrough, with the associated commands. I'm simply compiling the steps to allow a single lookup for everyone else who has to walk down this road.
  • Ensure that it is the JVM which is consuming too many resources. A combination of free and sorting ps output should do the trick.  free tells us current memory consumption, while ps gives us process level statistics
$ free -m
                            total        used   free   shared   buffers   cached
  Mem:               15040    13956  1083      0          84         890
  -/+ buffers/cache: 12980 2059
  Swap:                  0            0        0

$ ps -eo pmem,pcpu,vsize,pid,cmd | sort -k 1 -nr | head -5 

69.8  67.5 8180544 24204 java -Duser.dir=[...]
  7.5   5.6 1550956 4866 python /mnt/manage.py run_gunicorn -c /mnt/conf.py
  7.4   6.3 1542356 4848 python /mnt/manage.py run_gunicorn -c /mnt/conf.py
  7.4   6.0 1548132 4863 python /mnt/manage.py run_gunicorn -c /mnt/conf.py
  7.4   5.8 1537104 4869 python /mnt/manage.py run_gunicorn -c /mnt/conf.py

Wow, so we seem to have barely a GB of memory free and a resource hog java process and a bunch of gunicorn processes. Let's take a look at the Java process shall we? The pid is 24204 (highlighted in previous command)
  • Let's find thread utilization of resources. ps to the rescue again!
$ ps -mo 'pid lwp stime time pcpu' -p 24204
        PID   LWP   STIME   TIME   %CPU
    24204       -      16:12   00:32:41   17.4
          -   24204   16:12   00:00:00   0.0
          -   24255   16:12   00:00:03   0.0
          -   24256   16:12   00:01:11   0.6
          -   24257   16:12   00:01:11   0.6
          -   24258   16:12   00:01:11   0.6
          -   24259   16:12   00:01:11   0.6
          -   24260   16:12   00:06:34  33.5
          -   24261   16:12   00:00:11   0.0
          -   24262   16:12   00:00:00   0.0
          -   24263   16:12   00:00:00   0.0
          -   24264   16:12   00:00:00   0.0
          -   24265   16:12   00:00:00   0.0
          -   24266   16:12   00:00:43   0.3
Interesting. It looks like a single issue here. How do we trace this? Why, we take a couple of thread dumps of course! We use jstack for this. jstack simply produces a thread dump of the given process.

jstack 24204 > 1.dump
jstack 24204 > 2.dump
jstack 24204 > 3.dump

I usually take multiple jstacks just for further analysis. You may also need to run jstack as sudo as sudo -u user jstack <pid>

  • Now we have the thread dump and the misbehaving thread, so we can check what that thread is doing. Since thread id in the previous command (24260) is a decimal, we first convert it into hex because thread dumps use hex ids. To do this we simply use bc to convert the number, and then tr to transform the uppercase into lowercase. 
$ echo "obase=16; 24260" | bc | tr '[:upper:]' '[:lower:]'
5ec4  # this is the thread id in hex

$ vim 1.dump # Now search for the thread hex in the dump file
[...]
"Concurrent Mark-Sweep GC Thread" os_prio=0 tid=0x00007fea2c083000 nid=0x5ec4 runnable
[...]
So it turns out in this specific case that the GC is taking up a massive number of CPU cycles. The solutions for this are numerous, and may vary depending on what your constraints are. The other issues may be related to database access, I/O access, or, quite frankly, a million other reasons relating to badly written code.

To be honest, this method of debugging does have it's constraints, but can be a godsend when you need to quickly validate your thinking when it comes to resource hog bugs. And of course, you can celebrate by posting memes on slack!

At Bloomreach, we prefer :patre: #insidejoke
Hope this is helpful! Until the next b̶u̶g̶ feature...  :D

PS: Yes, I know the first couple of steps can be done using htop as well, using htop in thread view. I just find ps so much more convenient


The Best Teacher.


I am .
I .
I .
I will win!
This page generates a new set of typos every time someone visits the page. It has enough variation that you can (almost) safely claim that every person on this site sees a personalized set of mistakes, never seen before by anyone and never to seen again by anyone. Eternally looking for perfection, just like you and me.

Twitterverse

Prologue

Finally after all these years, here's to the beginning of what was there, what is there and hopefully what will remain!! So here are my thoughts & words -Online!!