Age | Commit message (Collapse) | Author |
|
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
Its useful to have an event emitted when all of the sceneQueue tasks
have completed since the metadata can hook this for processing.
Therefore add such an event.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
If you hit Ctrl+C at the right point, the system processes the request
but merrily continues building. It turns out finish_runqueue() is called
but this doesn't stop the later generation and execution of the
runqueue.
This patch adjusts some of the conditionals to ensure the build really
does stop.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
In dry run mode, stamps for noexec tasks are being written out which
is incorrect. Avoid this.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
When looking at a list of tasks, do_patch and do_unpack were being
given equal priority when one clearly depends on another. The
reason for this was the default task weights of 0 being to tasks.
This is therefore changed to 1 to allow correct weighting of dependencies
which means the scheduler has better information available to it about
tasks.
Weight endpoints differently (10) for clearer debugging of priorities.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
The zero priority task should be run first but was being confused with
the None value the priority field defaulted to. Check for None
explicitly to avoid this error.
In the real world this doesn't change much but it confused the debug
output from the schedulers.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
The first part of the sstate code checks en-mass whether given checksums
are available. The next part of the code then either triggers those
setscene tasks either running them or skipping them if they've been
covered by others.
The problems was that this second part would always skip a task if it
was unavailable in the first part, even if it would have otherwise been
covered by other tasks.
This mean the mere presence of an artefact (or lack of presence) could
cause a different build failure.
The issue reproduces if you run a build and populate an sstate feed, then
run a second build off that feed, then run a third build off the sstate
feed of the second build (which is reduced in size).
The fix is rather than immediately skipping tasks if the checksum is
unavailable, create a list of missing tasks, then, if that task cannot
be covered by others we can skip it later. The deferral makes the
behaviour the same even when the cache is "incomplete".
[YOCTO #6081]
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
If a setscene task has [depends], its possible they may still get executed out
of order. The issue is that the dependencies are set to set() for all tasks
involved. This patch adds back in explict dependencies within these chains
to avoid the setscene task failures.
[YOCTO #6069]
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
This allows the commandline options to be processed in the dump signature
code.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
There is no easy way to make this change. We really need parameters for the -S
(dump signatures) handling code. Such a parameter can then be used within the
codebase to handle the signatures in different ways.
For now, "none" is the recommended default and "printdiff" will execute the
new (and more expensive) comparison algorithms.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
The runqueue should be using the "realtask" ID to lookup the task
hash, not the "task" ID. This patch resolves corruption issues where
incorrect task hashes were displayed within toaster.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
Use of waitpid on the worker processes is a bad idea since it conflicts
directly with subprocess internals. Instead use the poll() method
and returncode to determine if the process has exitted, if it has,
we can shut down the system.
This should resolve the hangs once and for all, famous last words.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
We're running into processes using 100% cpu. It appears theses are locked in
a subprocess.poll() type loop where the process has exited but the code is
looping as its not handling the ECHILD error.
http://bugs.python.org/issue14396
http://bugs.python.org/issue15756
This is likely due to one or both of the above bugs. The question is what actually
grabbed the child exit code as it wasn't this code. Its likely there is therefore
some other code racing and taking that code, it may be some kind of race like:
http://hg.python.org/cpython/rev/767420808a62/
where the fix effectively catches the childs codes in a different part of the system.
We could try and get everyone onto python 2.7.4 where the above bugs are fixed however
for now its safer to admit defeat and go back to polling explictly for our worker exit
codes.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
Catching all child exit status values is a bad idea. Setting an http sstate mirror
is a great way to view that spectacularly break things. The previous change did
have good code changes so don't revert those parts.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
There are several problems. Firstly, a return value of "None" can mean
there is a C signal handler installed so we need to better handle that
case. signal.SIG_DFL is 0 which equates to false so we also need to
handle that by testing explicitly for None.
Finally, the signal handler *must* call waitpid on all child processes
else it will just get called repeatedly, leading to the hanging behaviour
we've been seeing. The solution is to only error for the worker children,
we warn about any other stray children which we'll have to figure out the
sources of in due course.
Hopefully this patch gets things working again properly though.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
Failures on the autobuilder look like this handler is recursing. That
shouldn't be possible but it doesn't hurt to code as such.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
We've noticed hanging processes which appear to be looping around
waitpid. Its possible multiple calls to teardown are causing problem
or in theory multiple registrations (although the code should not
allow that). Regardless, put better guards around signal handler
registration.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
If we didn't setup any workers (such as bitbake -S), this would error
since we're trying to set a signal handler to None. This patch
avoids that problem.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
The sigchld handler was reaping any processes and this was leading to
confusion with any other process handling code that could be active.
This patch:
a) Ensures we only read any process results for the worker processes
we want to monitor
b) Ensures we pass the event to any other sigchld handler if
it isn't an event we're interested in so the functions are properly
chained.
Together this should resolve some of the reports of unknown processes
people have been reporting.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
Instead of a significant number of calls to waitpid, register a SIGCHLD
handler instead.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
A run of "bitbake bash -c unpack" when the task has already been
completed resulted in about 9000 calls to logger.debug(). With this
patch which comments out some noisy/less usefull logging and moves
other logging calls outside loops, this number is reduced to 1000
calls. This results in cleaner logs and gives a small but
measurable 0.15s speedup. The log size dropped from 900kb to 160kb.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
If the worker has already gone missing (e.g. SIGTERM), we should
gracefully handle the write failures at exit time rather than throwing
ugly tracebacks.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
If the worker (or fakeworker) process disappears for some reason, the
system doesn't currently even notice. To fix this, we call waitpid
periodically, looking for exit events of our children. If these
occur, we can gracefully shutdown the server.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
slef.self is clearly meant to be self, fix typo.
Otavio spotted and reported, thanks.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
* exception like this keeps spinning quite quickly generating GBs of logs
better to kill it asap and show invalid pickle
Signed-off-by: Martin Jansa <Martin.Jansa@gmail.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
A previous commit of mine used the target variable for two different uses
resulting in a lot more sstate being installed than is needed.
Fix the variable to use two different names and unbreak the setscene
behaviour.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
If you specify multiple targets on bitbake's commandline and some of them are
setscene tasks which are "masked" by other tasks they may not get run.
For example <image>:do_rootfs <kernel>:do_populate_sysroot
the rootfs tasks "masks" the populate_sysroot task so bitbake would currently
decide not to run it. In this case, we do really want it to be run.
The fix is not to skip anything which has been given as an explict target.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
Commit c54e738e2b5dc0d8e6fd8e93b284ed96e7a83051 added in the idea of hard dependencies
such as the case a setscene has a hard dependency on pseudo-native and that
dependency wasn't available from sstate for some reason.
Unfortunately the implementation was a bit too enthusiastic, causing rebuilds
of things when it wasn't necessary. A test case was:
bitbake quilt-native
bitbake quilt-native -c clean
bitbake <some-image>
and then you'd watch quilt-native get rebuilt for no good reason.
The clue to the problem is in the for loop where it never depends on
the item being iterated over.
The fix is to include the exact list of hard dependencies rather than
guessing. With these changes, the use case above works, the one in
the original commit also works.
This patch also adds in or cleans up various pieces of logging to
allow issues like this to be more easily debugged in future.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
Shared work directories work by assuming bitbake will not run
more than one task with a specific stamp name. Recent runqueue optimisations
accidentally broke this meaning there could be races. This fixes the code.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
Based upon the list of difference starting points, we can use the siggen.find_siginfo()
function call and the difference printing code to provide a list of differences
between the current build target and whatever can be obtained from the sstate cache.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
The way hash_deps was being generated was different to the way siggen generated
the data internally which lead to seemingly different sigdata/siginfo files
for the same checksum. The -S output correct but the files written during
builds contained superflous data which would look like a difference.
This patch removes the badly duplicated data and uses it from the source
which ensures its consistent.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
with an sstate cache
Its useful to understand where the delta starts against an existing sstate cache
for a given target. Adding this to the output of the -S option seems like a
natural fit.
We use the hashvalidate function to figure this out and assume it can find siginfo
files for more than just the setscene tasks.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
Currently tasks have no knowledge of which other tasks they depend
upon. This makes it impossible to do at least two things which would be
desirable/interesting:
a) Have the ability to create per recipe sysroots
b) Allow the aclocal files to be present only for the entries in
DEPENDS (directly and indirectly)
By exporting task data through this new variable, tasks can inspect
their dependencies and then take actions based upon this.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
This unlikely looking function was found to be eating a lot of CPU time
since it gets called once per trip through the idle loop if we're not
running a maximum number of processes. This was particularly true in
world builds of 13,000 tasks.
Calling the computation code is pretty pointless because until some
other task finishes nothing is going to become available to build.
We can know when things become available so this patch teaches the
scheduler this knowledge.
It also:
* skips any coputation when nothing can be built
* if there is only one available item to build, ignore the priority map
* precomputes the stamp filenames, rather than doing it every time
* saves the length of the array rather than calculating it each time
(the extra function overhead is significant)
Timing wise, initially, 5000 iterations through here was 20s, with
the patch 200000 calls takes the same time. The end result is that
builds get up and running faster.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
When using the dry run option (-n), bitbake would still try and fire
a specific fakeroot worker. This is doomed to failure since it might
well not have been built.
Add in some checks to prevent the failures.
[YOCTO #5367]
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
We have do_bundle_initramfs which is a task inserted after compile and
before build. It is not covered by sstate.
If we run a build with a valid sstate cache present, the setsceneverify
function realises it will rerun the do_compile step (due to the
bundle_initramfs task) and hence marks do_populate_sysroot to rerun.
do_install, a dependency of do_populate_sysroot is left as marked as
covered by sstate.
What we need to do is traverse the dependency tree for any setsceneverify
invalided task and ensure any dependencies are also invalidated. We can
stop at any point we reach another setscene task though.
This means the do_populate_sysroot task has the data from do_install
available and doesn't crash.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
Adding the sstate-related hash for all runqueue and
scenequeue tasks, as it's needed in the WebHob data.
Signed-off-by: Alexandru DAMIAN <alexandru.damian@intel.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
This patch adds task identifying information for all
runQueue and sceneQueue events, and for bb.build.Task* events.
This will allow matching event to specific tasks in the UI
handlers processing these events.
Adds RunQueueData functions to get the task name and task
file for usage with the runQueue* events.
Adds taskfile and taskname properties to bb.build.TaskBase.
Adds taskfile and taskname properties to the *runQueue* events
Signed-off-by: Alexandru DAMIAN <alexandru.damian@intel.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
Adding a CookerFeature that allows UIs to enable
receving a dependency tree once the task data has been
computed and the runQueue is ready to start.
This will allow the clients to display dependency
data in an efficient manner, and not recompute the runqueue
specifically to get the dependency data.
Signed-off-by: Alexandru DAMIAN <alexandru.damian@intel.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
Adding a runQueueTaskSkipped to notify that the tasks that are not
run either because they are set-scened or they don't need an update
(timestamp was ok).
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
Adding an event to be fired when a scene task is completed.
It is analogous to the run task completed event, and has
been missing for some reason.
Signed-off-by: Alexandru DAMIAN <alexandru.damian@intel.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
There would be an race issue if we:
$ bitbake make-3.81 make-3.82
This because they are being built at the same time which would cause
unexpected problems, for example:
[snip]
ERROR: Package already staged (/path/to/tmp/sstate-control/manifest-qemux86-make.populate-sysroot)?!
ERROR: Function failed: sstate_task_postfunc
[snip]
Or there would be python's strack trace such as:
[snip]
*** 0004: mfile = open(manifest)
0005: entries = mfile.readlines()
0006: mfile.close()
0007:
0008: for entry in entries:
Exception: IOError: [Errno 2] No such file or directory: xxx
[snip]
[YOCTO #5094]
We can quit earlier to avoid this kind of issue when two versions of the same PN
are going to be built since this isn't supported.
Signed-off-by: Robert Yang <liezhi.yang@windriver.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
Due to the worker split the ${DATE} and ${TIME} variables could end up
with different values for different workers.
E.g., a task like do_rootfs that is run within a fakeroot environment
had a slightly different view of the time than another task that was not
fakerooted which made it impossible to correctly refer to the image
generated by do_rootfs from the other task.
Signed-off-by: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
BUILDNAME is set from cooker by default, so since the worker split it
will not be set when executing functions. In OpenEmbedded this results
in /etc/version (which is populated from BUILDNAME) not having any
content. Pass this variable value through to the worker explicitly to
fix the issue.
Fixes [YOCTO #4818].
Signed-off-by: Paul Eggleton <paul.eggleton@linux.intel.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
This was missed off in a previous patch.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
time.sleep()
The existing backend server implementations were inefficient since they
were sleeping for the full length of the timeouts rather than being woken when
there was data ready for them. It was assumed they would wake and perhaps did
when we forked processes directory but that is no longer the case.
This updates both the process and xmlrpc backends to wait using select(). This
does mean we need to pass the file descriptors to wait on from the internals
who know which these file descriptors are but this is a logical improvement.
Tests of a pathaolgical load on the process server of ~420 rapid tasks
executed on a server with BB_NUMBER_THREAD=48 went from a wall clock
measurement of the overall command execution time of 75s to a much more
reasonable 24s.
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|
|
Help to pick up mistakes such as "bitbake -c cleanstate xyz" (instead
of "bitbake -c cleansstate xyz".)
Signed-off-by: Paul Eggleton <paul.eggleton@linux.intel.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
|