08 March 2015

The Kaspersky Code

by Leigh Lundin

Three weeks ago, Kaspersky Lab, the Russian security software maker exposed a cyber-espionage operation that many believe originated within the NSA. The devilishly clever bit of code hides in the firmware of disc drives and has the ability to continuously infect a machine. If you use a Windows computer, there’s a good chance it’s not only infected but was built that way likely without the manufacturers' knowledge.

Kaspersky researcher Costin Raiu says the NSA couldn’t have done it without the source code.

What?!!

The contention that the NSA definitely had access to the source code is not only patent nonsense, it ignores that fact that Kaspersky themselves supposedly didn’t have the code. Having the source code is the easy way, perhaps the preferred way, but it’s hardly the only way.

A Reuters article speculates how the NSA might have obtained the source code and indeed, one of those is a likely scenario. But it’s also feasible to do the job without the source and I’ll show you what I mean, a technique I used to unravel computer fraud programs. Fasten your seat belt because this is going to get technical.

World’s Greatest Puzzle

Those around in my Criminal Brief days know that I love puzzles. For me, the ultimate puzzle has been systems software programming, making the machine do what I want. But sometimes I’ve come up against puzzles, some benign, some not, where I didn’t have the source code.

Let’s try an example. What if we found mysterious code in our computer that looked something like this:

confused pseudo code snippet
Mysterious Snippet of Computer Code

If you can’t make sense out of this, you’re not alone. 98% of computer programmers wouldn’t know what to make of it either. But if you look closely, the data populating the upper block looks different from that in the lower block. This is a clue.

Unlike commercial and scientific programs, systems software deals with the operation of the computer itself– utilities, communications, and especially the operating system. The realm of a computer’s internals are abstract, far more so than the Tron movies. Key aspects seldom relate to real-world equivalents. Sure, we say that RAM is a little like notes spread out on your work table and that disc storage is kinda sorta like a file cabinet… but not really. Even the term RAM– random access memory– is misleading; there’s nothing random about it.

Back in the real world, let’s say you want to write a simple program that adds the number of apples and oranges. In most programming languages, this code would look like this:
total = apples + oranges
Internally, a program loads apples and oranges into registers (kind of like keying them into a calculator), adds them, and stores them in a variable called total. If we were to write this in the argot of the computer, we’d use assembly language mnemonics, an abstraction of the computer’s machine language. Deep, deep down in a program, we’d see nothing but numbers where we count…
0, 1, 2, 3, 5, 6, 7, 8, 9, A, B, C, D, E, F
Yes, A-F are digits in this context. Within the computer, our little program above might resemble…

simple pseudo-code program: total=apples+oranges
total = apples + oranges

What isn’t obvious to many programmers is that computer instructions are data. Indeed, some black-hat crackers (the bad guys) have used this property to sneak malware onto unsuspecting computers.

If you look again at the original sneak peek of data, you’ll start to see patterns and may even pick out the machine instructions from our code example above.

clarified pseudo code snippet
Less Mysterious Code Snippet

This puzzle solving is called reverse engineering. It’s possible to write a program called a disassembler (I have) or a de-compiler (I haven’t) to decode the machine language into something more intelligible. The program has to be smart enough to not only separate actual data from instructions, but distinguish the type of data.

As you see, compiling source into binary executable code isn’t a one-way street. With dedication and know-how, reversing the process is well within reach.

How safe do you feel now?

15 comments:

Melodie Campbell said...

I feel like all those years of university calculus were a waste of time! Humbled. Very interesting, Leigh.

Louis A. Willis said...

Well, I now know why I got a headache and my eyeball hurt when I once tried to learn how to code.

Leigh Lundin said...

Thank you for being the first brave persons to comment, Melodie and Louis. It's like taking the 'thrill' out of techno-thrillers, but genuine SleuthSayers couldn’t let a patently wrong statement from Kaspersky influence writers and the greater public! (Everyone believes that, right?)

R.T. Lawton said...

Leigh, keep informing us on what's going on even if we don't totally understand the technology involved.

Leigh Lundin said...

RT, thank you. I tried to make a stab at comprehensibility of a topic the vast majority of programmers never see but affects our lives in ways we never imagine.

Dixon Hill said...

Thank you, Leigh. I don't know about everyone, but I certainly believe that. And this was a terrific article! Absolutely fascinating.
--Dixon

Eve Fisher said...

Fascinating - I took a C++ and a Java class, and that was enough for me. Keep the information coming!

Vicki Kennedy said...

I agree, Leigh. Keep the articles coming. I like to stay informed although figuring all that out was a total mystery to me. I wish I had your knowledge and ability with computers, but I don't. Thanks for an interesting article and for keeping us all informed. I'd rather be aware of what's going on than hiding my head in the sand.

A Broad Abroad said...

The techno-spy stuff is all well and good (or bad, in this instance). What we non-hexadecimal-reading mortals fear is the danger of the powers-that-be deciding – for our own good, of course - to insert this into every machine prior to sale. Then, one day – should they decide I've become a 'person of interest' - they will have entrĂ©e into the workings of my ‘puter without so much as a by-your-leave. Frightening prospect. Thank you for scaring the bejeebers out of us.

Jeff Baker said...

Likewise as an illiterate in the world of programming I found this amazing! I've always assumed everyone can see anything I've done on computer. Leigh, Alan Turing would be proud of you!

Anonymous said...

Wow, like oh my, and wow again! This makes me too feel like a dumbbell. I taught Latin several years, but this piece shows me I'm really an empty pile of non-brain! ! Thanks. T. Straw in Manhattan

Leigh Lundin said...

Thank you, Dixon. You’re very kind. I worried about swamping everyone.

Eve, absorbing Java and especially C++ is rather like trial by fire. Despite computers becoming ‘smarter’, it seems we’re making programming more difficult for humans to learn.

Leigh Lundin said...

Vicki, computer geeks have often been an insubordinate lot, so we have that small advantage. As you say, too many people would rather hide their heads in the sand.

ABA, that sums matters up well. We are at the point that governments tap into our most private communications and, when viewing what we search for and read, even our most private thoughts.

Leigh Lundin said...

Jeff, thank you. Turing has always been an icon in the computer realm. By the way, I just saw The Imitation Game about Alan Turing and it is excellent. I found the film both smart and sensitive.

Thank you, Thelma, but don’t feel that way at all! It take a peculiar brain to grok what we call computer ‘internals’, as strongly suggested in the Turing movie. Meanwhile, I took two years of Latin and struggled the entire time! Some minds are better at one discipline and others handle different challenges uniquely their own way.

Leigh Lundin said...

In response to a question via email that asked about the numbers 23 and 24 in the code:

The question you ask about, the 23 and 24, are internal thingies called ‘registers’, special devices for arithmetic, logical operations, and addressing. In my pretend machine, I allowed 256 registers numbered 00-FF and chose to work with registers 23, 24, 25.

The 40 you see in the LOAD instructions, the 21 in the ADD, and the 50 for the STORE are called op-codes, or operation codes… basic instructions to the computer. The op-code is followed by a register. For example, the first instruction we encounter LOADs the number of oranges into register 23 and the next LOADs the number of apples into register 24. The ADD instruction adds registers 23 and 24 together and puts the sum into register 25. Finally, we STORE the result from register 25 into the variable ‘total’.

My friend Thrush is a brother-in-arms in the internal workings of computers and sent me this posting regarding Seagate and the currency of the NSA ‘infection’.