aboutsummaryrefslogtreecommitdiffstats
path: root/meta/recipes-devtools/python/python/create_manifest2.py
diff options
context:
space:
mode:
authorAlejandro Hernandez <alejandro.hernandez@linux.intel.com>2017-06-20 14:11:44 -0700
committerRichard Purdie <richard.purdie@linuxfoundation.org>2017-09-14 14:07:53 +0100
commit7295e5727bfb51eb12ea4e741fcbcf3a72f09468 (patch)
treeba24ea2c1d533e15d753f2c096f4c1fe64342cb4 /meta/recipes-devtools/python/python/create_manifest2.py
parent1781f9f3c893c76656f0bd5879a0cdb5cbe158fe (diff)
downloadopenembedded-core-7295e5727bfb51eb12ea4e741fcbcf3a72f09468.tar.gz
python: Restructure python packaging and replace it with autopackaging
The reason we have a manifest file for python is that our goal is to keep python-core as small as posible and add other python packages only when the user needs them, hence why we split upstream python into several packages. Although our manifest file has several issues: - Its unorganized and hard to read and understand it for an average human being. - When a new package needs to be added, the user actually has to modify the script that creates the manifest, then call the script to create a new manifest, and then submit a patch for both the script and the manifest, so its a little convoluted. - Git complains every single time a patch is submitted to the manifest, since it violates some of its guidelines. - It changes or may change with every release of python, its impossible to know if the required files for a certain package have changed (it could have more or less dependencies), the only way of doing so would be to install and test them all one by one on separate individual images, and even then we wouldnt know if they require less dependencies, we would just know if an extra dependency is required since it would complain, lets face it, this isnt feasible. - The same thing happens for new packages, if someone wants to add a new package, its dependencies need to be checked manually one by one. This patch fixes those issues, while adding some additional features. Features/Fixes: - A new manifest format is used (JSON), easy to read and understand. This file is parsed by the python recipe and python packages read from here are passed directly to bitbake during parsing time. - It provides an automatic manifest creation task (explained below), which automagically checks for every package dependencies and adds them to the new manifest, hence we will have on each package exactly what that package needs to be run, providing finer granularity. - Dependencies are also checked automagically for new packages (explained below). - Fixes the manifest in the following ways: * python-core should be base and all packages should depend on it, fixes lang, string, codecs, etc. * Fixes packages with repeated files (e.g. bssdb and db, or netclient and mime, and many others). - Removes the manifest from the python-native recipe (Why was it there in the first place?, native recipes do not get split). - Sitecustomize was fixed since encoding was deprecated. - The JSON manifest file invalidates bitbake's cache, so if it changes the python package will be rebuilt. - It creates a solution for users that want precompiled bytecode files (*.pyc) INCLUDE_PYCS = "1" can be set by the user on their local.conf to include such files, some argument they get faster boot time, even when the files would be created on their first run?, but they also sometimes give a magic number error and take up space, so we leave it to the user to decide if they want them or not. - Fixes python-core dependencies, e.g. When python is run on an image, it TRIES to import everything it needs, but it doesnt necessarily fails when it doesnt find something, so even if we didnt know, we had errors like (trimmed on purpose): # trying /usr/lib/python2.7/_locale.so # trying /usr/lib/python2.7/lib-dynload/_locale.so # trying /usr/lib/python2.7/_sysconfigdata.so while it didnt complain about _locale it should have imported it, after creating a new manifest with the automated script we get: # trying /usr/lib/python2.7/lib-dynload/_locale.so dlopen("/usr/lib/python2.7/lib-dynload/_locale.so", 2); import _locale # dynamically loaded from /usr/lib/python2.7/lib-dynload/_locale.so How to use (after a new release of python, or maybe before every OE release): - A new task called create_manifest was added to the python package, which may be invoked via: $ bitbake python -c create_manifest This task runs a script on native python on our HOST system, and since the python and python-native packages come from the same source, we can use it to know the dependencies of each module as if we were doing it on an image, this script is called create_manifest.py and in a very simplistic way it does the following: 1. Reads the JSON manifest file and creates a dictionary data structure with all of our python packages, their FILES, RDEPENDS and SUMMARY. 2. Loops through all of them and runs every module listed on them asynchronously, determining every dependency that they have. 3. These module dependencies are then handled, to be able to know which packages contain those files and which should RDEPEND on one another. 4. The data structure that comes out of this, is then used to create a new manifest file which is automatically copied onto the user's python directory replacing the old one. Create_manifest script features: - Handles modules which dont exist anymore (new release for example). - Handles modules that are builtin. - Deals with modules which were not compiled (e.g. bsddb or ossaudiodev) - Deals with packages which include folders. - Deals with packages which include FILES with a wildcard. - The manifest can be constructed on a multilib environment as well. - This method works for both python modules and shared libraries used by python. How to add a new package: - If a user wants to add a new package all that has to be done is modify the python2-manifest.json file, and add the required file(s) to the FILES list, the script should handle all the rest. Real example: We want to add a web browser package, including the file webbrowser.py which at the moment is on python-misc. "webbrowser": { "files": ["${libdir}/python2.7/lib-dynload/webbrowser.py"], "rdepends": [], "summary": "Python Web Browser support"} Run bitbake python -c create_manifest and the resulting manifest should be completed after a few seconds, showing something like: "webbrowser": { "files": ["${libdir}/python2.7/webbrowser.py"], "rdepends": ["core","fcntl","io","pickle","shell","subprocess"], "summary": "Python Web Browser support"} Known errors/issues: - Some special packages are handled differently: core, misc, modules,dev, staticdev. All these should be handled manually, because they either include binaries, static libraries, include files, etc. (something that we cant import). Specifically static libraries are not not supported by this method and have to be handled by the user. - The change should be transparent to the user, other than the fact that now we CANT build python-foo (it was pretty dumb anyway, since what building python-foo actually did was building the whole python package anyway), but doing IMAGE_INSTALL_append = " python-foo" would create an image with the requested package with no issues. [YOCTO #11510] [YOCTO #11694] [YOCTO #11695] Signed-off-by: Alejandro Hernandez <alejandro.hernandez@linux.intel.com>
Diffstat (limited to 'meta/recipes-devtools/python/python/create_manifest2.py')
-rw-r--r--meta/recipes-devtools/python/python/create_manifest2.py277
1 files changed, 277 insertions, 0 deletions
diff --git a/meta/recipes-devtools/python/python/create_manifest2.py b/meta/recipes-devtools/python/python/create_manifest2.py
new file mode 100644
index 0000000000..4c55bd7d7a
--- /dev/null
+++ b/meta/recipes-devtools/python/python/create_manifest2.py
@@ -0,0 +1,277 @@
+# This script is used as a bitbake task to create a new python manifest
+# $ bitbake python -c create_manifest
+#
+# Our goal is to keep python-core as small as posible and add other python
+# packages only when the user needs them, hence why we split upstream python
+# into several packages.
+#
+# In a very simplistic way what this does is:
+# Launch python and see specifically what is required for it to run at a minimum
+#
+# Go through the python-manifest file and launch a separate task for every single
+# one of the files on each package, this task will check what was required for that
+# specific module to run, these modules will be called dependencies.
+# The output of such task will be a list of the modules or dependencies that were
+# found for that file.
+#
+# Such output will be parsed by this script, we will look for each dependency on the
+# manifest and if we find that another package already includes it, then we will add
+# that package as an RDEPENDS to the package we are currently checking; in case we dont
+# find the current dependency on any other package we will add it to the current package
+# as part of FILES.
+#
+#
+# This way we will create a new manifest from the data structure that was built during
+# this process, ont this new manifest each package will contain specifically only
+# what it needs to run.
+#
+# There are some caveats which we try to deal with, such as repeated files on different
+# packages, packages that include folders, wildcards, and special packages.
+# Its also important to note that this method only works for python files, and shared
+# libraries. Static libraries, header files and binaries need to be dealt with manually.
+#
+# Author: Alejandro Enedino Hernandez Samaniego "aehs29" <alejandro.hernandez@intel.com>
+
+
+import sys
+import subprocess
+import json
+import os
+
+# Hack to get native python search path (for folders), not fond of it but it works for now
+pivot='recipe-sysroot-native'
+for p in sys.path:
+ if pivot in p:
+ nativelibfolder=p[:p.find(pivot)+len(pivot)]
+
+# Empty dict to hold the whole manifest
+new_manifest = {}
+
+# Check for repeated files, folders and wildcards
+allfiles=[]
+repeated=[]
+wildcards=[]
+
+hasfolders=[]
+allfolders=[]
+
+def isFolder(value):
+ if os.path.isdir(value.replace('${libdir}',nativelibfolder+'/usr/lib')) or os.path.isdir(value.replace('${libdir}',nativelibfolder+'/usr/lib64')) or os.path.isdir(value.replace('${libdir}',nativelibfolder+'/usr/lib32')):
+ return True
+ else:
+ return False
+
+# Read existing JSON manifest
+with open('python2-manifest.json') as manifest:
+ old_manifest=json.load(manifest)
+
+
+# First pass to get core-package functionality, because we base everything on the fact that core is actually working
+# Not exactly the same so it should not be a function
+print ("Getting dependencies for core package:")
+
+# Special call to check for core package
+output = subprocess.check_output([sys.executable, 'get_module_deps2.py', 'python-core-package'])
+for item in output.split():
+ # We append it so it doesnt hurt what we currently have:
+ if item not in old_manifest['core']['files']:
+ # We use the same data structure since its the one which will be used to check
+ # dependencies for other packages
+ old_manifest['core']['files'].append(item)
+
+for value in old_manifest['core']['files']:
+ # Ignore folders, since we don't import those, difficult to handle multilib
+ if isFolder(value):
+ # Pass it directly
+ if value not in old_manifest['core']['files']:
+ old_manifest['core']['files'].append(value)
+ # Ignore binaries, since we don't import those, assume it was added correctly (manually)
+ if '${bindir}' in value:
+ # Pass it directly
+ if value not in old_manifest['core']['files']:
+ old_manifest['core']['files'].append(value)
+ continue
+ # Ignore empty values
+ if value == '':
+ continue
+ if '${includedir}' in value:
+ if value not in old_manifest['core']['files']:
+ old_manifest['core']['files'].append(value)
+ continue
+ # Get module name , shouldnt be affected by libdir/bindir
+ value = os.path.splitext(os.path.basename(os.path.normpath(value)))[0]
+
+
+ # Launch separate task for each module for deterministic behavior
+ # Each module will only import what is necessary for it to work in specific
+ print ('Getting dependencies for module: %s' % value)
+ output = subprocess.check_output([sys.executable, 'get_module_deps2.py', '%s' % value])
+ for item in output.split():
+ # We append it so it doesnt hurt what we currently have:
+ if item not in old_manifest['core']['files']:
+ old_manifest['core']['files'].append(item)
+
+# We check which packages include folders
+for key in old_manifest:
+ for value in old_manifest[key]['files']:
+ # Ignore folders, since we don't import those, difficult to handle multilib
+ if isFolder(value):
+ print ('%s is a folder' % value)
+ if key not in hasfolders:
+ hasfolders.append(key)
+ if value not in allfolders:
+ allfolders.append(value)
+
+for key in old_manifest:
+ # Use an empty dict as data structure to hold data for each package and fill it up
+ new_manifest[key]={}
+ new_manifest[key]['files']=[]
+ new_manifest[key]['rdepends']=[]
+ # All packages should depend on core
+ if key != 'core':
+ new_manifest[key]['rdepends'].append('core')
+ new_manifest[key]['summary']=old_manifest[key]['summary']
+
+ # Handle special cases, we assume that when they were manually added
+ # to the manifest we knew what we were doing.
+ print ('Handling package %s' % key)
+ special_packages=['misc', 'modules', 'dev']
+ if key in special_packages or 'staticdev' in key:
+ print('Passing %s package directly' % key)
+ new_manifest[key]=old_manifest[key]
+ continue
+
+ for value in old_manifest[key]['files']:
+ # We already handled core on the first pass
+ if key == 'core':
+ new_manifest[key]['files'].append(value)
+ continue
+ # Ignore folders, since we don't import those, difficult to handle multilib
+ if isFolder(value):
+ # Pass folders directly
+ new_manifest[key]['files'].append(value)
+ # Ignore binaries, since we don't import those
+ if '${bindir}' in value:
+ # Pass it directly to the new manifest data structure
+ if value not in new_manifest[key]['files']:
+ new_manifest[key]['files'].append(value)
+ continue
+ # Ignore empty values
+ if value == '':
+ continue
+ if '${includedir}' in value:
+ if value not in new_manifest[key]['files']:
+ new_manifest[key]['files'].append(value)
+ continue
+ # Get module name , shouldnt be affected by libdir/bindir
+ value = os.path.splitext(os.path.basename(os.path.normpath(value)))[0]
+
+ # Launch separate task for each module for deterministic behavior
+ # Each module will only import what is necessary for it to work in specific
+ print ('Getting dependencies for module: %s' % value)
+ output = subprocess.check_output([sys.executable, 'get_module_deps2.py', '%s' % value])
+
+ # We can print dependencies for debugging purposes
+ #print (output)
+ # Output will have all dependencies
+ for item in output.split():
+
+ # Warning: This first part is ugly
+ # One of the dependencies that was found, could be inside of one of the folders included by another package
+ # We need to check if this happens so we can add the package containing the folder as an RDEPENDS
+ # e.g. Folder encodings contained in codecs
+ # This would be solved if no packages included any folders
+
+ # This can be done in two ways:
+ # 1 - We assume that if we take out the filename from the path we would get
+ # the folder string, then we would check if folder string is in the list of folders
+ # This would not work if a package contains a folder which contains another folder
+ # e.g. path/folder1/folder2/filename folder_string= path/folder1/folder2
+ # folder_string would not match any value contained in the list of folders
+ #
+ # 2 - We do it the other way around, checking if the folder is contained in the path
+ # e.g. path/folder1/folder2/filename folder_string= path/folder1/folder2
+ # is folder_string inside path/folder1/folder2/filename?,
+ # Yes, it works, but we waste a couple of milliseconds.
+
+ inFolders=False
+ for folder in allfolders:
+ if folder in item:
+ inFolders = True # Did we find a folder?
+ folderFound = False # Second flag to break inner for
+ # Loop only through packages which contain folders
+ for keyfolder in hasfolders:
+ if (folderFound == False):
+ #print("Checking folder %s on package %s" % (item,keyfolder))
+ for file_folder in old_manifest[keyfolder]['files']:
+ if file_folder==folder:
+ print ('%s found in %s' % (folder, keyfolder))
+ folderFound = True
+ if keyfolder not in new_manifest[key]['rdepends'] and keyfolder != key:
+ new_manifest[key]['rdepends'].append(keyfolder)
+ else:
+ break
+
+ # A folder was found so we're done with this item, we can go on
+ if inFolders:
+ continue
+
+ # We might already have it on the dictionary since it could depend on a (previously checked) module
+ if item not in new_manifest[key]['files']:
+ # Handle core as a special package, we already did it so we pass it to NEW data structure directly
+ if key=='core':
+ print('Adding %s to %s FILES' % (item, key))
+ if item.endswith('*'):
+ wildcards.append(item)
+ new_manifest[key]['files'].append(item)
+
+ # Check for repeated files
+ if item not in allfiles:
+ allfiles.append(item)
+ else:
+ repeated.append(item)
+
+ else:
+
+ # Check if this dependency is already contained on another package, so we add it
+ # as an RDEPENDS, or if its not, it means it should be contained on the current
+ # package, so we should add it to FILES
+ for newkey in old_manifest:
+ # Debug
+ #print("Checking %s " % item + " in %s" % newkey)
+ if item in old_manifest[newkey]['files']:
+ # Since were nesting, we need to check its not the same key
+ if(newkey!=key):
+ if newkey not in new_manifest[key]['rdepends']:
+ # Add it to the new manifest data struct
+ # Debug
+ print('Adding %s to %s RDEPENDS, because it contains %s' % (newkey, key, item))
+ new_manifest[key]['rdepends'].append(newkey)
+ break
+ else:
+ # Debug
+ print('Adding %s to %s FILES' % (item, key))
+ # Since it wasnt found on another package, its not an RDEP, so add it to FILES for this package
+ new_manifest[key]['files'].append(item)
+ if item.endswith('*'):
+ wildcards.append(item)
+ if item not in allfiles:
+ allfiles.append(item)
+ else:
+ repeated.append(item)
+
+print ('The following files are repeated (contained in more than one package), please check which package should get it:')
+print (repeated)
+print('The following files contain wildcards, please check they are necessary')
+print(wildcards)
+print('The following files contain folders, please check they are necessary')
+print(hasfolders)
+
+# Sort it just so it looks nice
+for key in new_manifest:
+ new_manifest[key]['files'].sort()
+ new_manifest[key]['rdepends'].sort()
+
+# Create the manifest from the data structure that was built
+with open('python2-manifest.json.new','w') as outfile:
+ json.dump(new_manifest,outfile,sort_keys=True, indent=4)