Ready to Start Your Career?

Analyze and Visualize Data with Android APK

anomali8888 's profile image

By: anomali8888

September 21, 2017

Analyze and Visualize Data Android APK with LINT in Python CGI Programming

In this brief tutorial, I will show you, step by step, how to analyze an Android APK with Android LINT as well as create a nice graphical output of the result into python CGI web.

Note: Make sure you have installed Android LINT (Android SDK) into your operating system and also make sure you have enabled and configured the Python CGI directory on your operating system.

Step 1: Use Android LINT to analyze Android APK

Before we start! First, you should know what Android lint is. According to tools.android.comAndroid Lint is a new tool introduced in ADT 16 (and Tools 16) which scans Android project sources for potential bugs.

Here are some examples of the types of errors that it looks for:

  • Missing translations (and unused translations)

  • Layout performance problems (all the issues the old layoutopt tool used to find, and more)

  • Unused resources

  • Inconsistent array sizes (when arrays are defined in multiple configurations)

  • Accessibility and internationalization problems (hardcoded strings, missing contentDescription, etc)

  • Icon problems (like missing densities, duplicate icons, wrong sizes, etc)

  • Usability problems (like not specifying an input type on a text field)

  • Manifest errors

Also it finds hardcoded strings in your java class APK. So it is a really good tool for whoever wants to improve the code and check your APK code to find bugs or vulnerabilities inside. Now, we have covered enough important theory that we should know before using android lint, so let's start coding.To generate the data of the analysis, I will use these tools from github “keepsafe / android-resource-remover” (link: https://github.com/KeepSafe/android-resource-remover) but I will customize the code slightly. Instead of removing the bug, I will alter the code to find the bug.

You can install from pip or just download from github link that I provided:

command :#~ pip install android-resource-remover(Note: I downloaded the zip file from Github so I could edit and customize the code.)After downloading the zip file of the code, go to the android_clean_app.py file. From the provided screenshot, comment the part of the code that removes the resource and value so it will make the code find the bug for us.From this point you have two choices:1. Use Python version 2 to run the program:
python android_clean_app_py [--lint LINT] [--app APP] [--xml XML]
2. Use python version 3 to run this program:I will choose the second option because in my project I have to create an automated tool of analyzing Android APK using Python version 3 and because the android_clean_app.py is in python version 2, I have to use the subprocess-call to execute the script. When I try to convert the Python program to Python version 3, there are some dependencies. To save time, I will execute it and pipe the result.

Following is the code from python version 3 that will execute the android_clean_app.py:

#!/usr/bin/python3import subprocesslint_path = “/usr/share/android-sdk/lint”#path of my lint toolsapp_path = “ “#fill with your own directory outputprocess = subprocess.Popen([‘python3’,’android_clean_app.py’,’--lint’,lint_path,’--app’,app_path],stdout=subprocess.PIPE,stderr=subprocess.PIPE)out,error = process.communicate()#print out for debugging
So, after you execute the program it will create an XML file contain the value inside the directory you specify.

Step 2: Organize the XML into CGI web

Now that you have the result, you can move to the second step which is organizing the result into the python CGI. The following is code that I used in my project, so please edit or customize according to your needs. The scenario in this program is that which will get the result from another main Python program to get the value of the application directory path using form CGI.The following is the result of the code that I used:
#!/usr/bin/python3import xml.etree.ElementTree as ETimport numpy as npimport cgiimport cgitbimport osimport htmlimport syscgitb.enable()#for debuggingform = cgi.FieldStorage()path_lint = form.getvalue('path')#get the path from main python cgi programos.chdir(path_lint)xml = ET.ElementTree(file="lint-result.xml")#create an object to handle the resultroot = xml.getroot()print("Content-type: text/htmlrnrn")print("<h1 align='center'>Lint Analysis:</h1>")for child,xml in zip(root,root.iter(tag='location')):#zip two value tag (issues, location)print("<table width='100%'>")print("<tr>")print("<th width='20%' bgcolor='#6666ff'>")print(child.tag)print("</th>")print("<th width = '80%' bgcolor='#6666ff'>")print("</th>")print("</tr>")print("<tr>")print("<td width='20%'>")print("Id:")print("</td>")print("<td width = '80%'>")print(child.attrib["id"])print("</td>")print("</tr>")print("<tr>")print("<td width='20%'>")print("Severity:")print("</td>")print("<td width = '80%'>")print(child.attrib["severity"])print("</td>")print("</tr>")print("<tr>")print("<td width='20%'>")print("Message:")print("</td>")print("<td width = '80%'>")print(child.attrib["message"])print("</td>")print("</tr>")print("<tr>")print("<td width='20%'>")print("Summary:")print("</td>")print("<td width = '80%'>")print(child.attrib["summary"])print("</td>")print("</tr>")if "errorLine1" not in child.attrib.keys():#sometimes in the result there will be no errorline attributepasselse:print("<tr>")print("<td width='20%'>")print("Errorline:")print("</td>")print("<td width = '80%'>")print(html.escape(child.attrib["errorLine1"]))#convert to readable string because the value is in html formatprint("</td>")print("</tr>")print("<tr>")print("<td width='20%' bgcolor='#6666ff'>")print(xml.tag)print("</td>")print("<td width = '80%' bgcolor='#6666ff'>")print("</td>")print("</tr>")print("<tr>")print("<td width='20%'>")print("Location:")print("</td>")print("<td width = '80%'>")print(xml.attrib["file"])print("</td>")print("</tr>")print("</table>")print("<br/>")print("<br/>")
  Step 3: Visualize LINT result into graphNow, this is the tricky part. You cannot just simply put the graph to the cCGIwebsite. According to “http://scipy-cookbook.readthedocs.io”Trying to use matplotlib in a python CGI script naïvely will most likely result in the following error: 
...352, in _get_configdirraise RuntimeError("'%s' is not a writable dir; you must setenvironment variable HOME to be a writable dir "%h)RuntimeError: '<WebServer DocumentRoot>' is not a writable dir; you must setenvironment variable HOME to be a writable dir
Matplotlib needs the environment variable HOME to point to a writable directory. One way to accomplish this is to set this environment variable from within the CGI script on runtime (another way would be to modify the file but that would be not as portable).You can check the coding at this link: (http://scipy-cookbook.readthedocs.io/items/Matplotlib_Using_MatPlotLib_in_a_CGI_script.html)So here is the following code that I used to generate bar graph:(Note: These two codes that I just showed are used in the same CGI program. I break it down into two parts to make it easier for me to explain the functionality of the code section.)
#!/usr/bin/python3import xml.etree.ElementTree as ETimport numpy as npimport cgiimport cgitbimport osimport htmlimport sysimport matplotlibimport pylabimport matplotlib.pyplot as pltdata1 = []data2 = []os.environ['HOME'] = path_lintmatplotlib.use('Agg')# chose a non-GUI backendxml = ET.ElementTree(file="lint-result.xml")#create an object to handle the resultroot = xml.getroot()for child in root:#loop for find unique keyif child.attrib["id"] in data1:passelse:data1.append(child.attrib["id"])for line in data1:#loop to find how many issues have occur in the lint resultcount = 0for line_root in root:if line in line_root.attrib["id"]:count += 1data2.append(count)data_x = [x for x in range(0,len(data1))]data_x1 =['A'+str(x) for x in range(0,len(data1))]#i stick two value into one because the x axis of the graph cannot be change to string so I have to #map it with xticks functionplt.xticks(data_x,data_x1)pylab.bar(data_x,data2)pylab.savefig("chartlint.png", format='png')#save the figure to the directorylink to the result's picture: https://imgur.com/a/aiSCx
 Step 4: Assemble the code into one and put it into Python CGI webFor all of you who are a little bit confused about how to put the code together, here is the full code construction that I use in my project. 
#!/usr/bin/python3import xml.etree.ElementTree as ETimport numpy as npimport cgiimport cgitbimport osimport htmlimport syscgitb.enable()form = cgi.FieldStorage()path_lint = form.getvalue('path')os.chdir(path_lint)xml = ET.ElementTree(file="lint-result.xml")root = xml.getroot()print("Content-type: text/htmlrnrn")print("<h1 align='center'>Lint Analysis:</h1>")data1 = []data2 = []os.environ['HOME'] = path_lintimport matplotlibmatplotlib.use('Agg')for child in root:if child.attrib["id"] in data1:passelse:data1.append(child.attrib["id"])for line in data1:count = 0for line_root in root:if line in line_root.attrib["id"]:count += 1data2.append(count)import pylabimport matplotlib.pyplot as pltdata_x = [x for x in range(0,len(data1))]data_x1 =['A'+str(x) for x in range(0,len(data1))]plt.xticks(data_x,data_x1)pylab.plot(data_x,data2)pylab.savefig("chartlint.png", format='png')for child,xml in zip(root,root.iter(tag='location')):print("<table width='100%'>")print("<tr>")print("<th width='20%' bgcolor='#6666ff'>")print(child.tag)print("</th>")print("<th width = '80%' bgcolor='#6666ff'>")print("</th>")print("</tr>")print("<tr>")print("<td width='20%'>")print("Id:")print("</td>")print("<td width = '80%'>")print(child.attrib["id"])print("</td>")print("</tr>")print("<tr>")print("<td width='20%'>")print("Severity:")print("</td>")print("<td width = '80%'>")print(child.attrib["severity"])print("</td>")print("</tr>")print("<tr>")print("<td width='20%'>")print("Message:")print("</td>")print("<td width = '80%'>")print(child.attrib["message"])print("</td>")print("</tr>")print("<tr>")print("<td width='20%'>")print("Summary:")print("</td>")print("<td width = '80%'>")print(child.attrib["summary"])print("</td>")print("</tr>")if "errorLine1" not in child.attrib.keys():passelse:print("<tr>")print("<td width='20%'>")print("Errorline:")print("</td>")print("<td width = '80%'>")print(html.escape(child.attrib["errorLine1"]))print("</td>")print("</tr>")print("<tr>")print("<td width='20%' bgcolor='#6666ff'>")print(xml.tag)print("</td>")print("<td width = '80%' bgcolor='#6666ff'>")print("</td>")print("</tr>")print("<tr>")print("<td width='20%'>")print("Location:")print("</td>")print("<td width = '80%'>")print(xml.attrib["file"])print("</td>")print("</tr>")print("</table>")print("<br/>")print("<br/>")
 That's all!  It may be pretty slow for those of you who are using xmltree and matplotlib for the first time, but just give it a try and you will get the phase once you grasp the essence. As a final note and warning, be careful when you chunk a lot of data into a graph because, by default, the graph will not adapt to the size of the data. So, it will probably be bulky and not give you a nice output. You will want to customize the for matplotlib graph to handle a large data.
Schedule Demo