source: trunk/kernel/doc/40bits @ 153

Last change on this file since 153 was 1, checked in by alain, 8 years ago

First import

File size: 21.8 KB
Line 
11-Event Manager:
2The even manager has been changed in the fallowing ways :
3        - the API has been changed to avoid given a pointer to the second argumern to event_send we give only the
4        the gpid of the target processor.
5        - local event handling is gloablly the same
6        - remote event has been changed. Must be carrefully handled with physical access:
7                - the kfifo of the remote cpu is physically accessed
8                - the event_notifier must notify after copying the event structure (on the stack ?).
9                        !!! This does not aplying that the pointer arguments are copied !!!
10                        => modify event handler
11
123-Processes/Thread Management
13
14        3.1) Task Management
15
16                There's one gloabal table that index all process by pid.
17                Replicaing the kernel will also replicate this structure...
18                The fatest way to corrdinate the replica :
19                        - Consiste of using the table of cluster 0 as the main table. Each table will contain
20                        a pointer to the task structure if the task is local, other ways the table entry is
21                        either marked and contains the cluster id wich handle the task or a NULL value.
22                        A NULL value at the main table mean that the entry has not been allocated.
23                        A NULL value in other table must be confirmed by cheching the main table.
24
25                        This imply a change in pid allocate, wich must allocate from the main table by setting
26                        the cluster id and marking the entry with a remote flag (0x1). // TODO more on synch
27               
28                        The lookup need to check for the task in local, if we find a task address we return' it.
29                        If we find a remote entry we go to the indicate cluster. If the remote cluster does not contain
30                        the address of the task ...
31
32
33                Excuse me!  the fastest way is to directly use the main table as the only table and ignore other tables!!!
34                This will require putting a physical addresse on the table to point to task structures and synchronised
35                with a lock.
36
37        3.2)Task descriptor
38
39        This structure contain the following structures that are shared between threads:
40                - vmm (Virtual Memory Manager) : see 2-.
41                - pointer to root/cwd file, fdinfo and binary file: use physical addresses
42                - pointer to task parent: use a parent cluster id field, or directly a physical addresse ?
43                - 2 list : one for the children and one for the siblings
44                - a signal manager (NULL par default ?)
45                - thread handling structures :
46                        - a table containing a reference to all the threads (and a pointer to it's page struct)
47                        - a list entry root that englobe all thread: not really used, can be deleted
48               
49        How to manipulate these fields ? => by message passing ?
50        for thread's pointer we directly placed the ppn: thread's structures are aligned on a page.
51        Other pointers fd_info, bin are to locals strutures.
52        vmm handling is local if a single threaded task, otherwise the handling is done by message passing.
53
54        3.3) Thread Management
55        In the second stage...
56       
57               
58               
592-Memory Management:
60        vmm: message passing to the processor that contain the main structure.
61        There is no need to acess the thread data except for statistique pupose!
62        There is a part that could be done by the thread with passing a message... look in the page table part
63
64        To think : page fault at the same addresses how to avoid them (merge the messages of page fault) ?
65
66        kmem: all remote allocation are forbiden except for user pages (at least for now).
67        (user remote allocation have been done in the second stage.)
68
69
703-Cluster Manager:
71        One cluster manager for each cluster: no need for the global table...no need for the replication of cluster addresses
72        ...no needs for procs base addresses: the cpu_s are directly put into the cluster structures with the cluster base
73        address the compiler will offset to the good addresse.
74
75
763-DQDT:
77        initialisation has been modified to not require the remote allocation API. We set a table of two dqdt in the
78
79
80
814-Drivers
82
83
84
85
865-Boot:
87        A) The main proc of the boot cluster enter the boot loader, initiliase the stack bases for
88        all other procs in the boot_tbl and wait for all other proc to copy the needed info (kernel, boot,
89        boot info) from the boot cluster.
90
91        B) All other procs enter the boot_loader allocate a stack a the end of the cluster ram and
92        wait for their cluster to be initilised by their main proc. (except for those of the boot cluster ?)
93
94        C) The main procs of other clusters enter the boot loader, set the sp register to point to
95        the stack at the end of the current cluster ram.
96        (Enter the C code and) Copy the kernel/boot/boot_info from the main cluster to their cluster
97        increment a value  and sleep  back (wait instruction).
98
99        D) All procs sleep after incrementing a counter in the in the original BI (boot cluster).
100
101        E) Once the counter reach the number of proc, the main proc of the boot cluster can then
102        copy the kernel/boot/boot_info in the same cluster but at the correct adresses. This will erase the
103        code of the preloader. (the sp pointer of the boot cluster is good, the preloader should set it to
104        point
105
106        !!! moving the boot code is dangerous, since we are currently executing it ... and the
107        addreses in it are fixed ?!!! If we really want this space, we need to give up at this stage on the
108        boot code, BI can be kept.
109       
110        ***********************************************************************************************************
111        Initital memory content at boot (TODO):
112
113        ------------------ 0x0000 0000
114        Preloader
115        ------------------ ENDPRELOADER
116        Kernel
117        ------------------ END kernel paddr (important here, the charging addresse (paddr) is differente than vaddr)
118        BI
119        ------------------
120        Boot code
121        ------------------ End of boot
122        .....
123        ------------------ end of cluster space - 4 boot stacks
124        Boot stacks (x nbproc)
125        ----------------- End of cluster space
126
127       
128        Boot code role is to set:
129                1 - the kernel and boot info: deplace it to the address 0
130                2 - the stacks for each proc
131                3 - initialise the info structure (allocated on the stack): don't forget to put the BI info address
132                        that resulted from displacing it.
133                4 - initialise the page table and the mmu if we are booting in virtual mode
134
135        ***** Phase 1: boot proc of the boot cluster
136        - after the preloader, there's only the boot_proc who has a stack and all other proc are waiting (an WTI).
137        - jump to C code
138        - wakeup all proc (possible improvement: set a distributed wakeup)
139
140        ***** Phase 2: all proc (the boot cluster proc)
141
142        - start by setting a stack at the end of current cluster
143        - jump to C code
144        - set info structure
145        - copy the kernel and BI info (only one proc per cluster) at the start of the current cluster
146        (- set the page table and the mmu)
147        - jump to the kernel (of the cuurrent cluster) and wait in it ? yes wait for the
148          kernel image of cluster 0 to be set (since we will probably synchronise with it)
149         
150
151        ? how to signal to the current cluster proc to jump to kernel ?
152
153        --> the boot code is now used by only the boot cluster
154        --> the preloader code is used by only the boot cluster other procs
155
156        ***** Phase 3:
157        - put a barrier to be sure that all procs have entred the boot code
158        - all proc set their stack: to put after the boot proc stack
159        (- set the page table and the mmu)
160
161        --> the preloader code is no longer in use
162
163        - copy the kernel and BI of the boot cluster (by the boot proc)
164        - all proc of the boot cluster can now jump to the kernel and wait
165        - the boot proc enters the kernels and wake up a proc per cluster (except for the current
166          boot cluster)
167
168        --> the boot code is no longer in use
169               
170
171        !! be careful the stack of all the procs are still in use !!
172        put the in the reserved zone ?
173
174        N.B.: all addresses of the boot cluster are found using compiled in variables
175
176
177        ****** It's up to the kernel now...
178       
179Kernel boot:
180        ***** Phase 1: only the boot proc
181        wake up one proc per cluster
182        !!! Must be sure that they are sleeping (waiting) ? !!!
183
184        ***** Phase 2: A proc per cluster
185        Tasks: '->' indicate dependance
186
187        1) initialise task bootstap and task manager and set idle arg
188                reported: devfs/sysfs roots
189
190        2) initialise arch_memory : (initialise boot_tbl: done in the boot phase); initialise
191                clusters_tbl ?
192        initialise arch: initialise current cluster: initialise ppm (physical page memory) (can
193                                alloc now!), cpu_init : event listener!
194                         initialise devices (printk) (put all in cluster devices lists, root all
195                        interruptions locally (to the proc handling them));
196                         and register them (!!!to be reported!!!) -> depend on message api
197
198
199        3) --Barrier--: here we should wait that everybody has set there event listener.
200
201        only main proc of I/O cluster: initialise fs cache locally(?);
202                                        initialise devfs/sysfs roots and set devices.
203        ----Barier ?------
204
205
206        4) task_bootstarp_finalize :  should not be needed all should be done in task_init (TODO)
207           kdmsg_init : a set of lock (isr, exception, printk)_lock  and printk_sync
208       
209        5) DQDT init: arch_dqdt_init and dqdt_init (TODO)
210
211        6) Forget about bootstrap_replicate_task and cluster_init_table
212                a local cluster_init_cores is needed ?
213
214        register devices in fs ?
215       
216        **** Phase 3: all procs (thread idle)
217
218        set thread_idle for each proc (parrallèle ?)
219
220        load thread_idle: free reserved page (the boot loader reserved pages...), create the event
221                                thread event manager, kvfsd only in the cluster_IO.
222
223
224        *** Phase 4: kvfsd (only main cpu of cluster IO)
225       
226        enable all irq.
227
228        initialise fs -> need __sys_blk
229        task load init require devfs file system
230        if failed set kminishell
231       
232
233
234
235
236
237
238
239
240Kernel Runing Modifications:
241
242        *** DQDT (Ressource handling): Must be set using physical addresses ?
243
244        *** FS support: Open, read, write ... all physical access or just when reaching the device ?
245
246        *** Memory support: when there is no multithreaded application (only kthread) the memory
247                management is all done localy. Cross cluster messaging for remote creation.
248
249        *** Task support: a replicated taskmanager, pids are now composed of XYN, X and Y are the
250                cluster offset, N is the offset in the current cluster. Remote creation can be done
251                only using message passing.
252
253        *** Device support:
254
255        *** Exception, Interruptions (not syscalls): all device interruptions are rooted and handled
256                        locally, as configured at boot time.
257
258        Add cluster span to the arch config or in the cluster_entry. Yes it must be defined by the
259        boot code since this the code that decide the offset when using virtual memory.
260
261        *** User space support:
262                - Syscall: switch adressing mode in case we come or go from a syscall; save the coproc2 extend register
263                - copy from/to user must be modified
264
265
266Developement process:
267        ********** Phase one *******************
268        Consiste of boting the kernel with no user space support and with a dummy dqdt suport
269
270        *** Stage 0:
271                Rewrite the kernel in the view of passing to 40 bits:
272                        Handle gloabal variables:
273                        1) remove inused gloabl variables
274                        2) simplifie the message passing API
275                        3) do a simplified DQDT : replace dqdt files by a dummy : next_cluster++
276                        4) task and thread creation only using message passing
277                   5) do a true copy_from_usr and copy_to_usr !
278                   6) delete the assumption that all address superior to KERNEL_OFFSET belongs
279                                         to the kernel(genarally used to verify if an address belongd to user space)
280                                         replace by checking that the address is between USR_OFFSET and _LIMIT
281                                         (see libk/elf.c) for an example.
282                       
283
284        *** Stage 1:
285                Develloping boot code: require setting the disk, develloping virtual memory for
286                        the perspective of a dooble mode of booting, replicate the kernel and put it
287                        at address zero.
288                Develloping kernel boot code (reaching kern_init and printing a message ?), setting boot_dmsg
289
290        *** Stage 2: (TODO: synchronise with stage 0)
291                Rewrite of most of the kernel:
292                        ******* GOALS:
293                        1) delete (replace) reference to (unused) global variables:
294                                - cluster_tbl : no such a thing as cluster init...
295                                        reference to cluster struct should be removed to keep only cids: this will affect allot of code!
296                                        No more a access to remote listener ptr (event.c), nor cpu struct, not cluster, nor ppm of remote cluster ...
297
298                                - devfs_db replaced by the root dentry/inode of the devfs file system and since
299                                there is only one kernel that use it => allocate it dynamically
300                                - similar for sysfs_root_entry ?
301                                - devfs_n_op/devfs_f_op : are const
302                                - (dma,fb,icu,iopic,sda,timer,xicu,tty)_count are per cluster the
303                                  new naming scheme is device_cidn
304                                - rt_timer: Not used => removet it
305                                - (exception, isr, printk, boot)_lock: are all used to lock the printing
306                                - kexcept_tty, klog_tty, tty_tbl, kisr_tty, kboot_tty ???
307
308                                - soclib_(any device)_driver are RO: can we mark them as const ?
309                                - boot_stage_done: not used
310
311                        2) Event Manager to use physical adresses.
312                        3) fs
313                        4) all interruption (produced by device and WTI) are handled locally
314                        5) all exception are handled locally (there is no remote page fault, since
315                                there is no user space => no multithreaded user space application)
316                        6) Complete the kernel boot phase.
317                        7) Context switch: adding the saving of the coproc registers: DATA_extend register and Virtual
318                                mode register
319        ?cpu_gid2ptr: should be modified to fetch only local cluster, if a remote cluster is called, it should return an error
320
321        *** Stage N: user space support
322                $27 is now used by the kerntry code! user space should no longer used it
323                Switch to user space need to modify to set off the MMU and reset when going back...
324                set in init context the mmu register value (0xF)
325
326***Structures modif (with study only struct fields)****
327
328** NOTATIONS:
329-PB: problem,
330-HD: handled in the corresponding structure. This is for structure that are
331directly embeded in the structure and wich contain a non local pointer
332-NPB: no problem
333-LPTR: local pointer
334-RPTR: remote pointer
335-RPTR-S: RPTR  that is static => point to a structure that cannot be moved across clusters 
336-RPTR-D: RPTR that is dynmic=> point to a structure that can be moved across clusters
337(which mean that the  ptr need to be updated!)
338
339
340** Structures that need to accessed remotly (and does they need to be dynamic ?)
341struct task_s;
342struct thread_s;
343
344
345** Structures details
346
347        + thread structure (thread.h)
348        struct thread_s
349        {
350                struct cpu_uzone_s uzone;//NOPB
351                spinlock_t lock;//PB:HD
352                thread_state_t state;//NOPB
353                ...//NOPB
354                struct cpu_s *lcpu;//NOPB: LPTR   /*! pointer to the local CPU description structure */
355                struct sched_s *local_sched; //NOPB LPTR /*! pointer to the local scheduler structure */
356
357*               struct task_s *task; //PB: RPTR
358
359                thread_attr_t type;//NOPB           /*! 3 types : usr (PTHREAD), kernel (KTHREAD) or idle (TH_IDLE) */
360                struct cpu_context_s pws;//NOPB     /*! processor work state (register saved zone) */
361
362*               struct list_entry list;//PB       /*! next/pred threads at the same state */
363*               struct list_entry rope;//OB       /*! next/pred threads in the __rope list of thread */
364                struct thread_info info;//HD      /*! (exit value, statistics, ...) */
365
366                uint_t signature;//NOPB
367        };
368        struct thread_info
369        {
370                ...//NOPB
371                struct cpu_s *ocpu;//PB: RPTR-S
372                uint_t wakeup_date;//NOPB           /*! wakeup date in seconds */
373                void *exit_value;//NOPB                  /*! exit value returned by thread or joined one */
374*               struct thread_s *join;//PB:RPTR-D              /*! points to waiting thread in join case */
375*               struct thread_s *waker;//RPTR-D
376                struct wait_queue_s wait_queue;//PB:HD
377*               struct wait_queue_s *queue;//PB
378                struct cpu_context_s pss;//NPB      /*! Processor Saved State */
379                pthread_attr_t attr;//HD
380                void  *kstack_addr;//NPB:LPTR (always moved with the thread)
381                uint_t kstack_size;//OK
382                struct event_s *e_info;//never used!!!!!!!!!!!
383                struct page_s *page;//LTPR ?
384        };
385        typedef struct
386        {
387                ...//NOPB
388                void *stack_addr;//user space ?
389                size_t stack_size;
390                void *entry_func;//USP                  /* mandatory */
391                void *exit_func;//USP                   /* mandatory */
392                void *arg1;//?
393                void *arg2;//?
394                void *sigreturn_func;//?
395                void *sigstack_addr;//USP todo
396                size_t sigstack_size;//?
397                struct sched_param  sched_param;//PPB
398                ...//NOPB                        /* mandatory */
399        } pthread_attr_t;
400       
401        + task structure
402        struct task_s
403        {
404                /* Various Locks */
405                mcs_lock_t block;//HD
406                spinlock_t lock;//HD
407                spinlock_t th_lock;//HD
408                struct rwlock_s cwd_lock;//HD
409                spinlock_t tm_lock;//local:HD
410
411                /* Memory Management */
412                struct vmm_s vmm;//HD
413
414                /* Placement Info */
415                struct cluster_s *cluster;//local
416                struct cpu_s *cpu;//local
417
418                /* File system *
419                //for cwd and root they should always point to the same
420                //file struct (not node) to avoid ensurring the coherence
421                //if one of them change
422                struct vfs_inode_s *vfs_root;//
423                struct vfs_inode_s *vfs_cwd;//
424                struct fd_info_s  *fd_info;//embed it
425                struct vfs_file_s *bin;//Static: embed it
426         
427                /* Task management */
428                pid_t pid;//NPB
429                uid_t uid;//NPB
430                gid_t gid;//NPB
431                uint_t state;//NPB
432                atomic_t childs_nr;//NPB
433                uint16_t childs_limit;//NPB
434
435                //make them of type faddr
436                struct task_s *parent;//PB
437                struct list_entry children;//PB
438                struct list_entry list;//PB
439
440                /* Threads */
441                uint_t threads_count;
442                uint_t threads_nr;
443                uint_t threads_limit;
444                uint_t next_order;
445                uint_t max_order;
446
447                //add a function to register after we creat it ?
448                BITMAP_DECLARE(bitmap, (CONFIG_PTHREAD_THREADS_MAX >> 3));
449                struct thread_s **th_tbl;//PB: TODO: local table of pointer to avoid the page below?
450                struct list_entry th_root;
451                struct page_s *th_tbl_pg;//NPB: local (make it local)
452
453                /* Signal management */
454                struct sig_mgr_s sig_mgr;
455
456        #if CONFIG_FORK_LOCAL_ALLOC
457                struct cluster_s *current_clstr;
458        #endif
459        };
460
461               
462               
463       
464       
465File per file modif:
466        *** kern dir files:
467*PKB            - kern_init.c (see up)
468                - atomic.c: handle atomic_t and refcount_t objects (try to keep them local ? It will
469                  be hard (fs, event handler?))
470                - barrier.c: used only by sys_barrier.c
471                - blkio.c: this layer allow an fs to send request to block device, all request pass
472                  here.(manipulate a device struct. With this it's best to replicate the notion
473                  of device ? NO). Use blkio async to send an IPI blokio request.
474                - cluster.c : some init functions, cid2ptr (to delete ?), manager, key_op (never used)
475                - cond_var.c: used only by user space (at least for now)
476                - cpu.c : the function cpu_gid2ptr access the cluster structure! see the structure for more
477                - do_exec : use copy_uspace!
478                - do_interupt : remove thread migration handling ?
479                - do_syscall : (N.B. the context save is for the fork syscall) remove thread migration ?
480                - event : modify to support the ...
481                - (keysdb/kfifo/mwmr/mcs_sync/radix/rwlock/semaphore/spinlock).c: their modification depend on the scope in which their are used
482                - kthread_create: local
483                - rr-sched: should be local
484                - scheduler: (thread_local_cpu) should be local
485                - task.c: should get simplified => removing replicate calls (they should may be used to see how the migration goes)
486                - thread_create.c: use of cpu_gid2ptr, should be local
487                - destroy|dup|idle.c: should be local
488                - thread migrate.c: need to be modified
489        SYSCALLS
490        Miscelanious:
491                - sys_alarm.c: manipulte local data (And thread migration is initiated by each thread indepndantly...)
492*PKB            - sys_barier.c: not local, to complexe for no reason except performance, we should pospone this functionnality?
493                                I think we need such a struc for the boot. This mean that posponing is not possible, but we can simplify it
494                - sys_clock.c: local
495                - sys_cond_var.c:
496                - sys_dma_memcpy.c:
497                - sys_rwlock.c
498                - sys_sem.c
499        Task:
500                - sys_exec.c
501                - sys_fork.c
502                - sys_getpid.c
503                - sys_ps.c
504                - sys_signal.c
505                - sys_thread_create.c
506                - sys_thread_detach.c
507                - sys_thread_exit.c
508                - sys_thread_getattr.c
509                - sys_thread_join.c
510                - sys_thread_migrate.c
511                - sys_thread_sleep.c
512                - sys_thread_wakeup.c
513                - sys_thread_yield.c
514        Task Mem:
515                - sys_madvise.c
516                - sys_mcntl.c
517                - sys_mmap.c
518                - sys_sbrk.c
519                - sys_utls.c
520        FS:
521                - sys_chdir.c:
522                - sys_close.c:
523                - sys_closedir.c:
524                - sys_creat.c:
525                - sys_getcwd.c
526                - sys_lseek.c
527                - sys_mkdir.c
528                - sys_mkfifo.c
529                - sys_open.c
530                - sys_opendir.c
531                - sys_pipe.c
532                - sys_read.c
533                - sys_readdir.c
534                - sys_stat.c
535                - sys_unlink.c
536                - sys_write.c
537        mm/:
538                Task related:
539                vmm.c: munmap (region:detach, split, resize); mmap (shared, private | file, anomymous);
540                        skbrk(resize heap region TODO: use the resize func), madvise_migrate, vmm_auto_migrate,
541                        (modify page table attribute and set migrate flag), madvise_will_need;
542                vm_region.c:...
543                Other:
544                ppm.c: handle page descriptors. We find some use of cluster ptr that should be removed ?
545                page.c: to make local
546               
547               
548
549Question:
5501) How is the time handled in almos. Does we use a unique timer to keep the time seamleese across node ?
5512) Copy to/from userspace
5523) Cross cluster lists: thread lists
5534) Only the kentry is mapped when entring the kernel from user space... This is going to be a complexe point.
554Three new things need to be done:
5551- if from user space (or simply if we were in the vitual mode was actif), switch to physical mode of local cluster
5562- if from kernel save DATA_EXT register. Also, if we do cpy_uspace by switchin to the virtulle space, we need to
557do what we do in one.
558
559
560Remark:
561cross cluster access also keep in mind that other access are possible, like page based access (temporarely map a page).
562
563
564
565
566Task and thread subsystems:
567Per cluster vmm or per thread ?
568
569Per cluster: 1) interest: less physical space cosummed and less coherence to handle
570             2) Problems: "resharing" of the stack zone could complex (make it impossible to migrate the stack of thread and thus the thread) ?
571
572Per thread:  1) interest: virtualise the virtual mapping between thread become easy  (for stack and TLS)
573                                and temporary mappping are allot easier...
574                                (and we could free the $27 register!)
575             2) Problems: to much use of physical space (cache) + require more coherence message
576
577
578
579Files need to accessible across cluster, since they could be share by tasks!
580
581
582
583TODO:
584Check that we have mapped enough of the kernel, for the kentry ?
585
586
587
588VMM replicated region (mapper) require that the insertion to the page cache be atomic!
589
590
591DEveloppement results:
592event_fifo.c
593remote operation.c
594arch_init.c:ongoing for devices
595cluster_init.c
596kentry.c:k1 is now used by the kernel! No, it's that we k1 only when we were in
597kernel mode... Otherwise when coming in usert mode we don't need it and save it?
Note: See TracBrowser for help on using the repository browser.