Java is a safe programming language and prevents programmer
from doing a lot of stupid mistakes, most of which based on memory management.
But, there is a way to do such mistakes intentionally, using Unsafe class.
This article is a quick overview of sun.misc.Unsafe public API and few
interesting cases of its usage.
Unsafe instantiation
Before usage, we need to create instance of Unsafe object.
There is no simple way to do it like Unsafe unsafe = new Unsafe(),
because Unsafe class has private constructor. It also has static
getUnsafe() method, but if you naively try to call Unsafe.getUnsafe() you, probably,
get SecurityException. Using this method available only from trusted code.
1 2 3 4 5 6 | |
This is how java validates if code is trusted. It is just checking that our code was loaded with primary classloader.
We can make our code “trusted”. Use option bootclasspath when running
your program and specify
path to system classes plus your one that will use Unsafe.
1
| |
But it’s too hard.
Unsafe class contains its instance called theUnsafe, which marked as private.
We can steal that variable via java reflection.
1 2 3 | |
Note: Ignore your IDE. For example, eclipse show error “Access restriction…”
but if you run code, all works just fine. If the error is annoying, ignore errors on
Unsafe usage in:
Preferences -> Java -> Compiler -> Errors/Warnings ->
Deprecated and restricted API -> Forbidden reference -> Warning
Unsafe API
Class sun.misc.Unsafe
consists of 105 methods. There are, actually,
few groups of important methods for manipulating with various entities.
Here is some of them:
- Info. Just returns some low-level memory information.
addressSizepageSize
- Objects. Provides methods for object and its fields manipulation.
allocateInstanceobjectFieldOffset
- Classes. Provides methods for classes and static fields manipulation.
staticFieldOffsetdefineClassdefineAnonymousClassensureClassInitialized
- Arrays. Arrays manipulation.
arrayBaseOffsetarrayIndexScale
- Synchronization. Low level primitives for synchronization.
monitorEntertryMonitorEntermonitorExitcompareAndSwapIntputOrderedInt
- Memory. Direct memory access methods.
allocateMemorycopyMemoryfreeMemorygetAddressgetIntputInt
Interesting use cases
Avoid initialization
allocateInstance method can be useful when you need to skip object initialization phase
or bypass security checks in constructor or you want instance of that class
but don’t have any public constructor. Consider following class:
1 2 3 4 5 6 7 8 9 | |
Instantiating it using constructor, reflection and unsafe gives different results.
1 2 3 4 5 6 7 8 | |
Just think what happens to all your Singletons.
Memory corruption
This one is usual for every C programmer. By the way, its common technique for security bypass.
Consider some simple class that check access rules:
1 2 3 4 5 6 7 | |
The client code is very secure and calls
giveAccess() to check access rules. Unfortunately, for clients,
it always returns false. Only privileged users somehow can change
value of ACCESS_ALLOWED constant and get access.
In fact, it’s not true. Here is the code demostrates it:
1 2 3 4 5 6 7 8 9 | |
Now all clients will get unlimited access.
Actually, the same functionality can be achieved by reflection. But interesting, that we can modify any object, even ones that we do not have references to.
For example, there is another Guard object in memory
located next to current guard object. We can modify its ACCESS_ALLOWED field with the following code
1
| |
Note, we didn’t use any reference to this object.
16 is size of Guard object in 32 bit architecture.
We can calculate it manually or use sizeOf method, that defined… right now.
sizeOf
Using objectFieldOffset method we can implement C-style sizeof function.
This implementation returns shallow size of object:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | |
Algorithm is the following: go through all non-static fields including all superclases, get offset for each field, find maximum and add padding. Probably, I missed something, but idea is clear.
Much simpler sizeOf can be achieved if we just read size value from
the class struct for this object, which located with offset 12 in JVM 1.7 32 bit.
1 2 3 4 | |
normalize is a method for casting signed int to unsigned long, for
correct address usage.
1 2 3 4 | |
Awesome, this method returns the same result as our previous sizeof
function.
In fact, for good, safe and accurate sizeof function better to use
java.lang.instrument package,
but it requires specifyng agent option in your JVM.
Shallow copy
Having implementation of calculating shallow object size, we can simply
add function that copy objects. Standard solution need modify your code with Cloneable,
or you can implement custom copy function in your object, but it won’t be multipurpose function.
Shallow copy:
1 2 3 4 5 6 7 | |
toAddress and fromAddress convert object to its address in memory and vice versa.
1 2 3 4 5 6 7 8 9 10 11 12 | |
This copy function can be used to copy object of any type, its size will be calculated dynamically. Note that after copying you need to cast object to specific type.
Hide Password
One more interesting usage of direct memory access in Unsafe is removing
unwanted objects from memory.
Most of the APIs for retrieving user’s password, have signature
as byte[] or char[]. Why arrays?
It is completely for security reason, because we can nullify array elements after we don’t need them.
If we retrieve password as String it can be saved like an object in memory and nullifying that
string just perform dereference operation. This object still in memory by the time GC decide to perform cleanup.
This trick creates fake String object with the same size and replace original one in memory:
1 2 3 4 5 6 7 8 9 10 | |
Feel safe.
Multiple Inheritance
There is no multiple inheritance in java.
Correct, except we can cast every type to every another one, if we want.
1 2 3 | |
This snippet adds String class to Integer superclasses, so we can cast
without runtime exception.
1
| |
One problem that we must do it with pre-casting to object. To cheat compiler.
Dynamic classes
We can create classes in runtime, for example from
compiled .class file. To perform that read class contents
to byte array and pass it properly to defineClass method.
1 2 3 4 | |
And reading from file defined as:
1 2 3 4 5 6 7 8 | |
This can be useful, when you must create classes dynamically, some proxies or aspects for existing code.
Throw an Exception
Don’t like checked exceptions? Not a problem.
1
| |
This method throws checked exception, but your code not forced to catch or rethrow it. Just like runtime exception.
Fast Serialization
This one is more practical.
Everyone knows that standard java Serializable capability
to perform serialization is very slow. It also require class
to have public non-argument constructor.
Externalizable is better, but it needs to define schema for
class to be serialized.
Popular high-performance libraries, like kryo have dependencies, which can be unacceptable with low-memory requirements.
But full serialization cycle can be easily achieved with unsafe class.
Serialization:
- Build schema for object using reflection. It can be done once for class.
- Use
UnsafemethodsgetLong,getInt,getObject, etc. to retrieve actual field values. - Add
classidentifier to have capability restore this object. - Write them to the file or any output.
You can also add compression to save space.
Deserialization:
- Create instance of serialized class.
allocateInstancehelps, because does not require any constructor. - Build schema. The same as 1 step in serialization.
- Read all fields from file or any input.
- Use
UnsafemethodsputLong,putInt,putObject, etc. to fill the object.
Actually, there are much more details in correct inplementation, but intuition is clear.
This serialization will be really fast.
By the way, there are some attempts in kryo to use Unsafe http://code.google.com/p/kryo/issues/detail?id=75
Big Arrays
As you know Integer.MAX_VALUE constant is a max size of java array.
Using direct memory allocation we can create arrays with size limited by only heap size.
Here is SuperArray implementation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | |
And sample usage:
1 2 3 4 5 6 7 8 | |
In fact, this technique uses off-heap memory and partially
available in java.nio package.
Memory allocated this way not located in the heap and not under GC management, so take care of it
using Unsafe.freeMemory(). It also does not perform any boundary checks, so any
illegal access may cause JVM crash.
It can be useful for math computations, where code can operate with large arrays of data. Also, it can be interesting for realtime programmers, where GC delays on large arrays can break the limits.
Concurrency
And few words about concurrency with Unsafe.
compareAndSwap methods are atomic and can be used to implement
high-performance lock-free data structures.
For example, consider the problem to increment value in the shared object using lot of threads.
First we define simple interface Counter:
1 2 3 4 | |
Then we define worker thread CounterClient, that uses Counter:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
And this is testing code:
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
First implementation is not-synchronized counter:
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
Output:
1 2 | |
Working fast, but no threads management at all, so result is inaccurate. Second attempt, add easiest java-way synchronization:
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
Output:
1 2 | |
Radical synchronization always work. But timings is awful.
Let’s try ReentrantReadWriteLock:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
Output:
1 2 | |
Still correct, and timings are better. What about atomics?
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
Output:
1 2 | |
AtomicCounter is even better. Finally, try Unsafe
primitive compareAndSwapLong to see if it is really privilegy to use it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | |
Output:
1 2 | |
Hmm, seems equal to atomics. Maybe atomics use Unsafe? (YES)
In fact this example is easy enough, but it shows some power of Unsafe.
As I said, CAS primitive can be used to implement lock-free data structures.
The intuition behind this is simple:
- Have some state
- Create a copy of it
- Modify it
- Perform
CAS - Repeat if it fails
Actually, in real it is more hard than you can imagine. There are a lot of problems like ABA Problem, instructions reordering, etc.
If you really interested, you can refer to the awesome presentation about lock-free HashMap
Bonus
Documentation for park method from Unsafe class contains
longest English sentence I’ve ever seen:
Block current thread, returning when a balancing unpark occurs, or a balancing unpark has already occurred, or the thread is interrupted, or, if not absolute and time is not zero, the given time nanoseconds have elapsed, or if absolute, the given deadline in milliseconds since Epoch has passed, or spuriously (i.e., returning for no “reason”). Note: This operation is in the Unsafe class only because unpark is, so it would be strange to place it elsewhere.
Conclusion
Although, Unsafe has a bunch of useful applications, never use it.