World Domination, RL, and Java
I did get that game of Civilization in, and I totally rocked it. That means my development box is set back up and plugged in. We’re making progress.
I’ve been spending a lot of time with The Lady lately, when I haven’t been busy doing something to my house, and when she hasn’t been busy. Next weekend she’s got a very important exam to take, so she’ll be trying to study most of the week, which means I’ll have time to myself to work on things I want to do.
This weekend is shot, as I’m leaving tomorrow morning for Ohio, where I’ll be spending the weekend until Sunday late afternoon / evening.
I’ve also been getting a lot of questions in email recently about the status of the Java 1.4 port for BeOS. It’s stalled. Terribly. Andrew has a girlfriend, we hit a major problem with R5′s memory management that makes it basically impossible to get things working correctly with any level of reasonable effort. Both Andrew and I thought it would be better to wait for Haiku to fix this, and that’s when I started really looking at helping with Haiku. Then I met this awesome woman, and well… yeah. I’ve got a girlfriend now too. So until Haiku is to the point where we can run intensive builds on it, Java is probably not going to get much attention. When that time comes, we’ll likely try to synch up with the most recent Sun JRE, and use our wonderfully unstable (when a thread stack overflows) version as the bootstrap.
This is just me talking out of my posterior though, don’t take this as the official statement. These are my personal views of how things -may- unfold.

April 16th, 2006 at 11:32 am
Do you mind explaining a bit about that show stopping problem?
Java doesn’t allow you to allocate anything on the stack, so any problem that’s occuring because of that seems to be a JVM issue, right??
April 17th, 2006 at 12:54 pm
Basically it centers around problems with R5′s create_area() doing weird things (B_EXACT_ADDRESS dosen’t seem to work, iirc) and no way to create areas inside of an area with more restrictive access rights.
Say I need a 64MB area that’s got B_READ_AREA | B_WRITE_AREA but I need to put 2k pages inside that area that are B_READ_AREA only.
The other option would be to allocate the whole 64MB worth as a bunch of separate areas with different permissions that all have address spaces that align nicely. The problem there is that R5 seems to be ignoring B_EXACT_ADDRESS, which breaks that idea.
The hotspot vm uses 2k guard pages that have READ permissions and a signal handler to detect stack overflows. We keep running out of stack space, and the VM doesn’t know how to handle it, as our threads start to really screw with each other — but only in complex apps with lots of threads with deep call stacks.
The problems appear much worse (without proper memory protection) when we’re dealing with SMP boxes.
April 17th, 2006 at 1:17 pm
Bryan, when you mentioned this problem in the past, you never mentioned 2K pages! Intel hardware doesn’t handle pages at anything smaller than a 4K page: Windows and BeOS most certainly can’t do full protection with exact addresses that aren’t aligned at 4K boundaries because the page tables (due to the hardware) deal with 4K boundaries. The best you can hope to do is to trap all accesses at a 4K granularity, and decipher which of the half of the address space is valid to access each time, which will be a real bummer for performance. This is no different between Windows and BeOS, or probably any other OS running on x86 hardware. IIRC, you can make pages powers of 2 larger than 4K within some range, but 4K is the minimum size possible.
So, B_EXACT_ADDRESS will be constrained to 4K boundaries, which I’ve not verified if it works for that, but that’s the maximum accuracy you could ever hope for in Real Life of any x86 OS that uses the standard paging hardware, and this is also true of Windows as well (I have tested there) but also keep in mind that where there’s not already an allocated memory area existing, Windows tends to require that things are allocated in 64K chunks, which may also be further constrained to 64K boundaries for the sake of sanity and not horribly fragmenting virtual address space in user space. You have more control over allocating single pages of memory in the physical address space when in kernel mode in Windows, of course, since that’s likely required for device drivers.
So, if the problem with B_EXACT_ADDRESS is because it won’t work when you try every other 2K stack page that’s not 4K page aligned, that is something that can’t be fixed in the OS itself, because the OS can only work with the hardware, so you’d have to make an entire 4K page that *also* includes the 2K guard page as protected. If BeOS can’t set protections at the 4K page level with exact addresses that are page aligned and integral multiples of 4K pages in size, then that *is* a BeOS problem, and can be fixed in Haiku. This is the way of the x86, like it or not…
I’m not sure what the usual thread stack size is for the hotspot VM is, but that may be more of an OS problem with standard thread allocations due to the 256K fixed-size stack BeOS hands out to all threads other than the main one, which (IIRC) gets a 16 meg stack by default, also of a fixed size, which I find quite wasteful: Windows apps have threads that are given 1 meg stacks by default, but you can change the sizes to be much larger or smaller than that. That is something Haiku should have that BeOS doesn’t have: the option of selecting the size of a stack given to threads.
April 17th, 2006 at 2:17 pm
I don’t have the code in front of me to verify that it’s 2k guard pages, or if they’re 4k. But I remember Andrew trying desperately to get things with B_EXACT_ADDRESS to work.
If it’s all just a matter of page alignment, then we may be able to work some voodoo.
The underlying problem still remains though, in that you cannot create an area inside another area with more restrictive permissions on the sub-area. Under other posix systems, this is possible.
April 17th, 2006 at 2:18 pm
And BeOS has initial TLS sizes of 128k.
This is a different thread stack, allocated in the VM’s reserved heap. TLS is used, but not for this part of the VM.
April 17th, 2006 at 2:20 pm
You can change the size of the TLS area after it’s been created. Doing this will let you bypass the 4096 thread count limit (which was imposed by the size of thread stacks, not sizeof(thread_id). :-)
Andrew did extensive testing on this. We were able to get a -lot- of threads, and it worked without crashing all the time. I was actually quite surprised at how stable it was.
April 17th, 2006 at 2:49 pm
Perhaps the granularity for setting permissions of overlapping areas is also 64K, which would be something not completely unexpected. Sure, that’d make a horribly inefficient stack usage scenario, but somehow it could be hacked in if that’s the case, I’d suspect. It’d also likely be expensive for CPU overhead beyond memory overhead. Perhaps you could try a test to see if you can create overlapping areas with different sets of permissions at B_EXACT_ADDRESS in full 64K increments evenly aligned on 64K boundaries.
IIRC, the 4096 thread limit is in the total system, not individual applications: there’s a 193 thread limit per application, using 16 megs for the main thread, and the rest of the fixed area (as reserved when using standard thread creation API calls) divided into 256K chunks, thus using 64 megs of address space per process for stacks.
I understand that TLS (Thread Local Storage) is a different critter entirely, and never bothered to look into the limits/arrangements of that: I see TLS as a bad hack to support legacy non-threaded API’s such as using errno as an error code return method. Perhaps with modern CPU’s going towards multicore people will finally start getting away from that more and retire the old C/POSIX libraries… nah, too many billions of lines of legacy code :P
April 17th, 2006 at 3:56 pm
Bryan, I’ve been thinking about it (and not about the things I should be thinking about… I’ll have to pay for that later, I guess) and I have an idea for how things may work, that (of course) presumes that the Be Book reflects reality for areas being the size in increments of pages you define. That is, if BeOS truly does allow allocating an area that’s exactly 1 page in size whenever you ask for it to be that small, my idea will work, even if it requires moving the location of the stacks from the standard memory region BeOS puts them in…. it all comes down to rethinking how the stacks are allocated, and not worrying about allocating smaller areas that are a subset of a larger area with different permissions :)
Here is the gist of the idea:
1. It assumes BeOS isn’t quite like Windows in that if the Be Book is correct, it allows you to use B_EXACT_ADDRESS on a 1 page boundary for a 1 page area
2. Each stack would be a separate area of areas: that is, not a single area for the entire stack area, and not a separate area for just each single stack, but a list of contiguous areas, all with different area IDs: the guard page, an active area, and perhaps another area, such as a guard page on the other end of the stack.
3. This may require that the stacks for threads are moved away from the standard 64 meg area BeOS reserves for them, because create_area and clone_area may not accept ranges outside of the standard 1 GB application address limit.
Doing things this way (assuming you can actually allocate single page-sized areas) completely eliminates the problem that you cannot allocate an area within an area that has different permissions under BeOS. Yes, it will run the system out of area ID’s much quicker, but I suspect that limit isn’t reached on most systems even with lots of applications running, if only because a lot of applications are GUI applications which hit the thread limit of App_Server long before the area IDs are gone.
April 17th, 2006 at 8:48 pm
Hey Bryan! I just wrote an incredibly simple test application to test my theory, and while there are (of course!) memory ranges that aren’t possible to do B_EXACT_ADDRESS on, you can find them. Also, allocating a single page at a time *does* work, as well as allocating the one(s) above it, as demonstrated by this simple application I’ll copy/paste below. I did run into a very curious thing that appears to be a BeOS bug, however, that goes against the stated “fact” that areas are cleaned up once a process has been terminated: if I don’t explicitly call delete_area() before exiting the application with this example code, I’m likely not going to be able to grab that address when running this application (minus the lines of delete_area: I was being lazy for the sake of a test, and didn’t expect that!) and found that I couldn’t get those addresses again in the same application…. weird… But the good news is that if you have a range BeOS doesn’t already have allocated as an area, you can do what I suggested!
//Code here:
#include
#include
int main(int argc, char** argv)
{
cout>value;
void *DesiredAddress=(void*)(value*4096);
area_id FirstArea=create_area(“First!”,
&DesiredAddress,
B_EXACT_ADDRESS,
B_PAGE_SIZE,
B_NO_LOCK,
B_READ_AREA|B_WRITE_AREA);
coutB_OK)
{
delete_area(FirstArea);
}
if(SecondArea>B_OK)
{
delete_area(SecondArea);
}
cout
April 17th, 2006 at 11:00 pm
Bryan,
For some weird reason (perhaps it’s on moderation because I’ve entered a bunch of posts today?) it seems Web Press ate my post. Well, I tried an experiment, and yes, while there are limitations as to where you can do B_EXACT_ADDRESS, you *can* allocate single page areas, so my idea will work. I created a simple commandline test that demonstrates being able to do that with back-to-back single page areas allocated. Strangely enough, if I didn’t call delete_area() before exiting, the next time I ran the application from Terminal I wouldn’t be able to allocate the same pages as an earlier allocation, at least not with repeatability: I’m wondering if there’s some weird bug there, or if that’s related to starting the application as a child process of Terminal, but it seemed like BeOS wasn’t releasing those small areas reliably.
Here is the sample code:
#include
#include
int main(int argc, char** argv)
{
cout>value;
void *DesiredAddress=(void*)(value*4096);
area_id FirstArea=create_area(“First!”,
&DesiredAddress,
B_EXACT_ADDRESS,
B_PAGE_SIZE,
B_NO_LOCK,
B_READ_AREA|B_WRITE_AREA);
coutB_OK)
{
delete_area(FirstArea);
}
if(SecondArea>B_OK)
{
delete_area(SecondArea);
}
cout
April 18th, 2006 at 1:44 am
Bryan,
I wrote a very simple test app to test my theory from this most recent post (I seem to have BeOS SeaMonkey not work posting here) and while there are limits as to which pages you can get via B_EXACT_ADDRESS, it *will* work, because BeOS allows you to create single page areas, and do them back to back if possible at B_EXACT_ADDRESS.
April 18th, 2006 at 7:23 am
Maybe you could use create_area() to allocate two big areas, one with read right only, the second with read & write rights.
Then, for each small JVM page, call clone_area() on the corresponding access rights area with the good offset (you’ll have to handle these offsets yourself, unfortunatly) and the page size.
Well, maybe I’m just missing the point here.